18.676: Stochastic Calculus 


Lecturer: Professor Nike Sun 
Notes by: Andrew Lin 


Spring 2020 


Introduction 


Most of the logistical information is on the class website at [1], including an official class summary and many references 
to relevant papers and textbooks. Here are the main points for us: there will be homework roughly once every two 
weeks. The first two are already posted, and they'll be due February 12 and February 24 (submitted in class). Grading 
is weighted 55 percent for homework, 20 percent per exam, and 5 percent for attendance. Office hours are Monday 
2-4 in Professor Sun's office, 2-432. 

18.675 is a prereq, so we should talk to Professor Sun if we haven't taken that class. We will be using [2] as our 


main textbook. 
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Today, we'll begin with an informal overview of the topics covered in this class. We'll start with some basic reminders: 


the standard Gaussian density i 
2 
x)= — +e”? 
g(x) oe 
should be burned into our head, and the variable Z ~ N(0,1) is distributed according to this density. We should 
know that if X, X; are symmetric random signs (-+1 with equal probability), and S, = 4 X;, then mi converges 


in distribution to N(0, 1). We should also know how to prove this, either using the central limit theorem or by direct 


combinatorial calculation (because S, is a scaling of the binomial distribution). 

Next, we can consider the simple random walk on the integers, which gives us a process (Sp)n>o0 (where n is a 
time index): since Sn converges in distribution to a Gaussian, this means that over time n, the walk typically covers 
a distance on the order of ./n. So if we rescale time by n and rescale space by \/n, we get a process 

1 
— Synt}- 
Vn [nt] 


For any fixed t, we still have the central limit theorem as before, which tells us that X")(t) Ee N(0, t). But one 


X(t) = 


idea throughout this class will be that we don't need to consider a single t: the entire process X‘”) = (X(£)) so 
converges in distribution to (Br)¢>0, something called a Brownian motion, as n + oo. Note that we haven't defined 
a Brownian motion yet, and we haven't described the topology in which this converges in distribution. But we'll do 
everything more formally later on. 


In short, here are some of the main goals of this class: 


* Formal construction of Brownian motion, 
+ Convergence of some natural processes (like simple random walk), which we can think of as a “functional CLT,” 


* Calculations with Brownian motion (stochastic calculus). 


For now, though, we'll keep surveying some more ideas from the course: we're going to talk a bit about Itd’s 


formula and give an application to the conformal invariance of planar Brownian motion. 


Example 1 


First, we'll describe some properties of our Brownian motion Bz given our informal definition above. 


We should have Bo = O, and for any 0 < s < t, we should have B; — B; ~ N(0,t — 5). Also, for any 
O0< 5 <t <5 < to, Br, — Bs, and By, — Bs, should be independent (because they correspond to disjoint parts of 


the random walk). And in fact, just these properties actually suffice to characterize Brownian motion completely. 


Example 2 


Next, we'll do a conceptual overview of It6’s formula. Consider a process that evolves as 


dX¢ = pdt + or+dB;. 


Informally, we can think of writing this as 
Xtrat — Xt = urdt + o¢- N(0, dt). 


Let f : IR > R be a twice-differentiable function. If X, followed a deterministic smooth trajectory, then we would know 
how f(X+) evolves, since we just have df(X;) = f/(Xz)dX;_. But if we expand the stochastic version out, we instead 
find that 


dF(Xe) = F(Xe)OXe + 5 I"(Xe)( IX)? 


f"(X¢) 


5 (Med t 4 o:dB;)°. 


= f'(Xe) (Urdt + o+dB:) } 


Because dB; is on the order of Vdt, it dominates the ~z+dt, so we can replace (urdt + ordBy)* with just o(dB:)* = 
ozdt- N(0,1)*. What It6’s formula says is basically that we can actually ignore the fluctuations in the N(0, 1)? term 


if we take many measurements, and so that just disappears from the expression (it’s 1 on average). Thus, 


f"(X¢) 


df (Xt) = f'(X¢) (urdt + o4dBy) + odt 


PCG )oe 
= (Fx ee) t 


) at | f'(X:)o1dBt . 


and we've now separated the contribution into a drift and a stochastic term. 

Using this, we can do an application to planar Brownian motion — first, we'll review a bit of complex analysis. If 
we have a function f :C > C or f : DC for some open D, then f is holomorphic or complex differentiable at 
z €C if the complex derivative 

f(z) = im LET FZ) og 
h0 h 


exists. (Being complex differentiable is much stronger than being differentiable in IR? because we approach 0 in all 


directions in the complex plane.) If we think of our function as going from R* + R?, where z = x+iy and f =u+iv 


(so that u, v are real-valued functions and x, y are real numbers), then f is holomorphic at z if the limits from the real 
axis and imaginary axis are the same, meaning that we require 


Of Ou jv 10f 10u Ov 
Ox' {Oy idy | dy 


to be equal. Thus, we have the Cauchy-Riemann equations 
Ui =VW, Uy =—Vy. 
One useful thing to know is that the Laplacian of the real part of f is 
Au = Uxx + Uyy = Vxy — Vyx = 0, 


which means that the real part of any complex differentiable function is harmonic (and so is the imaginary part by an 


analogous calculation). 


Example 3 


Consider a two-dimensional (standard) Brownian motion (X¢, Y:) — this just means that X; and Y; are independent 


standard one-dimensional Brownian motions. Alternatively, we can take Z; = X;+/Y; to be a standard Brownian 


motion in C. 


Suppose we have a conformal map f : D + D’ (which means that f is holomorphic and has a holomorphic inverse 
f-+: D! + D). Again, let u = Ref and v=Imf. 


Question 4. How does f(Z;) evolve if Z; stops when it hits the boundary of D? 
We'll need a two-dimensional version of It6’s formula for this, but the same Taylor expansion idea works: 


dX: 
dy; 


Uxx(Zt)  Uxy(Zt) 
Uyx(Zt) Uyy(Z¢) 


du(Z;) = [ux(Ze) u/(z0)| +5 lax. dy] - 


When we expand this out, we get cross terms like dX; -dY+z, which look like dt - N(0, 1)N(0, 1). If we add up many 
of these, they cancel out and become negligible, so we don't have to worry about those: this means the second term 
will only have the diagonal term contributions 


5 (Weel AX)? + yy (d¥e)?), 


but now we can replace (dX;)? and (dY;)? with dt by the same argument as above, and now Ux + Uyy = 0 because u is 
harmonic. So the entire second-order term actually vanishes, and we're just left with (now doing the same calculations 
for v(Z¢)) 


du(Z;) 
dv(Zt) 


Ux(Zt) Uuy(Zt) 
Vx(Zr)  Vy(Z¢) 


dX? 
dy; 


The 2-by-2 matrix on the right-hand side can be thought of as 


fy @ Uy U 
G=|* “;=] * | =4/u2 + u2 (rotation matrix), 
Vx Vy Uy Ux 


but ,/uz + uf is just the modulus of f’(z) = ux + ivy, and the determinant of the rotation matrix has to be 1 (we 


have a conformal map). So we can conclude that if Z; = X; + /Y; is a standard Brownian motion in D C C, and 


f : D + D’ is conformal, then as long as Z; is in D, f(Zz) evolves via 


du(Z;) 
dv(Z¢) 


Xt 


= If’(Z;)|O(Zz) dy. 


where O(Z;) is a 2 x 2 rotation matrix. Note that the standard bivariate Gaussian N(0, /2x2) is rotationally invariant 
(spherically symmetric), so it is reasonable to believe that standard Brownian motion is also rotationally invariant: if 
O is a 2 x 2 orthogonal matrix and Z is a BM in R?2, then so is OZ. 

We can also consider scaling: if Bz is a Brownian motion, then oB; Is equal in distribution to Bj2;. This is clearly 
true for any fixed t because they're both Gaussian, but in fact the idea is that we actually can’t tell the two images 


(of the sample paths) apart. So the Brownian motion is a self-similar fractal. 


Remark 5. Note that in the formula we've derived, the scale factor and the rotation depend on the given time. (This 
means, for example, that when |f’(Z;)| is big, the process runs faster.) So the process itself is not conformally 


invariant, but the trace (the image of the motion) is indeed conformally invariant. 


With the rest of the time today, we'll give an example of a question that this class will let us answer: 


Example 6 


Consider a simple random walk on a grid with € spacing on (€Z) x (€Zso). Suppose that we start our walk near 


(0, y) and stop when we hit the horizontal axis. What is the law of the hitting location (the x-coordinate)? 


We'll think about this problem when € is small, since we can approximate this walk with a Brownian motion in the 
upper-half (complex) plane. One way to approach this is to map the problem into an easier domain: the function 
f(z) = see maps H conformally into the unit disk D, and our new starting point is now the origin. Since we only care 
about the hitting location (and not the time), and the Brownian motion is spherically symmetric, the hitting location 


must be uniform on OD, which means the distribution on H can be easily recovered. 
More explicitly, for any interval [a, b] € IR, we can calculate the probability of hitting within that interval explicitly — 


the angle covered near x is proportional to Foo = aE) (the factor of 27 coming from the length of the boundary 


of the unit disk), meaning that 


Pu(Z, € [a, b]) = Po(Z. € F(a, b])) 


-[ y dx 
2 HOEY) 


This last integrand is also called the Poisson kernel for H (we'll write it as Py(x) = 


y Ae 
TOs ), because it’s closely 


connected to the Dirichlet problem on HI, which goes as follows. If we're given a nice function b: R — R and we want 


to know the harmonic interpolation of b to H, the answer is given by 


A(x, y) = E[b(Z,)|Zo = x + iy]; 


that is, we start a Brownian motion at x + /y and find the expected value of b when it hits the boundary. (We can 
prove this by looking at a finite graph or with direct calculation, and we'll talk about it more later.) But explicitly, this 


can be written as an integral 


t[b(Z, + x)|Zo = ly] = i P,(s)b(s + x)ds. 


Since Py is symmetric, we can replace s with —s, and thus this expected value becomes a convolution (b* P,)(x) with 


the Poisson kernel. This is something we'll see come up in the future as well! 
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Today, we'll start to formalize some of the ideas from yesterday's informal overview — specifically, we'll be starting with 
Gaussian processes and Gaussian spaces. (We'll follow the textbook pretty closely for now.) All random variables will 


live on a common probability space (Q, F, P). 


Definition 7 


A d-dimensional Gaussian vector is an R¢-valued random variable X such that (X,u) is a one-dimensional 


Gaussian variable for any u € R®. 


This is somewhat fancier than other definitions and doesn’t depend on a choice of basis. In addition, this definition 
doesn't specify that (X, u) and (X, v) need to be jointly Gaussian for vectors u, v, but we'll see that it is a consequence 
of the definition. 


Proposition 8 


The law of X is uniquely determined by the mean vector u = E[X] € R®% and the covariance matrix © = 
EX i — 1) ie ea 


Proof. Take any 6 € R®@. By definition, (X, 6) is Gaussian with some mean and variance, but we can compute those 


from uw and ©: 


E[(X, 8)] = (E[X], 6) = (u, 8) 


by linearity of expectation, and 
Var((X, 8)) = Cov((X, 6), (X, 6)) = @7 re 


because covariance is bilinear. This means that we know the full distribution of (X, 6), which tells us the characteristic 


function 


x (8) = E exp (i(X, 8))] = exp (*. 6) 50758) | 


and as we saw in 18.675, knowing all of these characteristic functions is enough to determine the law of X. 


Because of this, we'll use the notation X ~ N(u,2), where w € R? and © € R?*? is a symmetric positive 
semidefinite matrix (because variance is always nonnegative). Any such matrix has a Cholesky factorization © = AA’, 
where A is a d x r matrix, and then if Z is standard normal in r dimensions (that is, iid standard Gaussian in each 
component), then we have 

U+AZ~ N(p, 2X). 


(Indeed, we can check that this has the right mean and variance by expanding out the matrix multiplication for AZ 
and using that E[Z,Z,] = 6».) In particular, we can write X = AA! for an invertible square matrix A if and only if © 
has full rank, and in this case only we can use the change of variables formula to find that X has density 


exp (—3(x — nw) Xt (x — p)) 
(21) 4/2| det X]1/2 


f(x) = 


(If r < d, then the law of X is supported on a subspace of R@ of lower dimension, so it has no density.) 


One important fact is that for Gaussians, being independent and being uncorrelated are the same thing: 


Lemma 9 


If X ~ N(u, X) is a d-dimensional Gaussian vector, then X; are mutually independent if and only if © is diagonal. 


In contrast, there’s the standard non-example where we consider Z ~ N(0,1) and ¢ ~ Unif({+1}). Then the 
covariance between €Z and Z is zero, but the two variables aren't independent because they always have the same 


absolute value (basically, this goes wrong because the two variables aren't jointly Gaussian). 


Proof. The forward direction is easy (independent implies uncorrelated). For the other direction, suppose & is a 


diagonal matrix. Then the characteristic function 


d 
. 1 
obx(@) = II exp (1 — 5708?) 
j=l 


has no cross-terms, so the characteristic polynomial factorizes and thus the different components are independent as 


desired. 


With this, we'll move on to the idea of a Gaussian space on (Q2,7,P). (For the rest of today, we'll assume 
Gaussians are centered, meaning they have mean zero.) Recall that L?(Q, F, P) is the space of all (IR-valued) random 


variables on our probability space with finite second moment — this is a Hilbert space with inner product 


ant y X(w)¥ (w)dP(w) = E[XY]. 


Definition 10 


A (centered) Gaussian space is a closed linear subspace of L?(Q,F,P) containing only centered Gaussian 


variables. 


Example 11 
Take X ~ N(0,X) in R?. Then the span of the coordinate random variables {X;,--- , Xq} is a Gaussian space. 
Meanwhile, a non-example is the span of Z ~ N(0,1) and eZ (where € is the random sign as before); this doesn't 


work because Z + €Z is zero half the time and thus not Gaussian. 


Gaussian spaces are important because they turn probability into geometry; for example, independence of Gaus- 


sian variables becomes orthogonality in the space: 


Theorem 12 
Let H C L?(Q,F,P) be a centered Gaussian space, and let (Ha)ac) be linear subspaces of H. Then the o-fields 


(o(Ha))ae; are independent if and only if the Hg are pairwise orthogonal. 


We can read the book for this — it’s really a fancier version of Lemma 9. A related point is that conditional 
expectation among Gaussians corresponds to orthogonal projection in a Gaussian space. By the way, the first hypothesis 
is important here — we want that the H, are all subspaces of a single centered Gaussian space to ensure that they 


are jointly Gaussian. 


Problem 13 


Suppose we have a bivariate normal (two-dimensional Gaussian) given by 


J-"(ldb dl) 


where we assume that the covariance matrix Is positive definite. What is the law of Y conditional on X? 


The standard trick used here is to find some @ such that Y — @X is independent from X. Here, Y — 0X is also 


jointly Gaussian with X and Y, so we just need to find what value of @ satisfies 
0 = Cov(Y — 6X, X) = b— 8a, 


meaning we should set 6 = 2. Then we can break Y up into a “parallel” and a “perpendicular” part as 


a 
b 
y=ox+(v-5x), 
a a 


where the first term is in a@(X) and the second term (independent of X) is a Gaussian with variance 


2 
var (y - 5x) = Cov (y-2xy) ao 
a a a 


Putting everything together, the conditional distribution Is given by 


More generally (as an exercise for us), if 


x 
Y 


0 
0 


(lle 2 


where X lives in a k-dimensional space and C lives in an £-dimensional space, then we can check that Y|X is distributed 
as N(B’A1X,C — BTA™1B). 


Theorem 14 
Let HC 1?(Q,F,P) be a centered Gaussian space, and let K C H be a closed subspace. Then for any X € H, 


we have 


X|o(K) = N(m«(X), E[(X — m(X))*I), 


where mx denotes the orthogonal projection of X onto K. 


In geometric terms, the mean is the “parallel” part, and the variance is the “perpendicular” part. (Also, m(X) is 
measurable with respect to a(K), so it is indeed known once we condition on the sigma-algebra.) In contrast, notice 


that for a general X € L?(Q, F, P) (not necessarily Gaussian), we only know that 


i(X|a(K)) = T12(2,0(K),P)(X). 


In particular, o(K) is generally very big — if K is the span of some variable Z, then a(K) is the set of all measurable 


functions of Z. So this theorem tells us that we can project onto a much smaller subspace in the Gaussian case. 


Problem 15 (Kalman filter; on homework) 


Suppose we have independent Gaussians €, ~ N(0,07) and 7, ~ N(0, 62) and we have some true unknown state 


of a system which evolves over time: 


0= Xo > Xi seers > Xn, Gel —a a Niele rele 


Suppose we're given a noisy observation Y; at each time satisfying Y, = CX, + 7,. (Assume that we know the 
values of a, and c and o? and 67.) Our goal is to find E[Xn|%1,--- . Yal- 


One approach is to define a Gaussian space H which is the span of the €; and 7 up to some time n. (In other 
words, this is all of the noise going into the system up to time n.) From the way this is designed, all of the X; and Y; 


are linear combinations of the es and 7s, so we always stay in the Gaussian space throughout the evolution. Then if 


we want E[X,|¥1,--- , Ynl, we're doing an orthogonal projection — in particular, X, must be a linear combination of Y; 


up to Yp. 


We'll next discuss Gaussian processes, which are generalizations of Gaussian vectors: 


Definition 16 
Let / be an arbitrary (possibly uncountable) index set. A collection of random variables (X;)+re; is a (centered) 


Gaussian process if any finite linear combination of the X¢s is a one-dimensional Gaussian. The Gaussian space 


generated by (X;) +e; is the closure of the linear span of the Xs. Define the covariance function [ :/ x /—>R 
via F(s, t) = Cov(Xg, Xz). 


The covariance function Tis symmetric and positive semidefinite, meaning that }°, ,@(s)@(t)F'(s, t) = 0 for any 
8 which is nonzero for finitely many values in / (otherwise, this statement may not make any sense), since this 
expression is just the variance of >>. 6(s)Xs. 

The natural follow-up question is whether there necessarily exists a Gaussian process with a given (symmetric, 
positive semidefinite) covariance function. The answer is yes, and this follows basically from the Kolmogorov extension 
theorem (which we proved in 18.675, so we won't do now). The most important example for us will be the construction 


of Brownian motion based on a Gaussian process with index set Ro, where the function takes the form 
l(s, t) = Cov(Bs, Br) = min(s, t) 


(because if WLOG s < t, then B; = B, + (B; — Bs), and the second term is independent of B,). But we don’t want 
to immediately cite the Kolmogorov extension theorem now, because we want to make sure that with probability 1, 
our process is continuous in t — that is, for all w € (Q,F,P), B:(w) is continuous in t. Specifically, the theorem will 
give us a measure on (R’, B®! v), but that’s not really the space we want to use — we want the space of continuous 
functions instead. So we'll come back to this a little later. 

On our homework, though, there's a different construction of Brownian motion based on the construction of “white 
noise,” which is what we'll discuss for the rest of class. The heuristic idea is that on every “pixel” of space, we see an 


independent Gaussian random variable, so we get “snow on a TV screen.” Here’s a more formal definition: 


Definition 17 


Let (E,€) be a measurable space, and let w be a o-finite measure on (E,€). Then a Gaussian white noise on 


(E, €) with intensity p is a linear isometry G : L?(E, €, 4) + H, where H C L?(Q, F,P) is a centered Gaussian 


space. 


An isometry preserves the inner product, so (f, g) = (G(f), G(g)). (In other words, the covariance between G(f) 
and G(g) should be the same as the inner product between f and g.) So for any “patch” of size dx around x, we 
assign a Gaussian variable N(0, u(dx)), and we do this for all points independently. So if we take a subset A C E, we 


can think of the white noise being distributed as the “sum of all of these small Gaussian random variables:” 
G(1a) = G(A) ~ N(O, uA). 
This is an isometry because A and B being disjoint means we have independence G(A) Il G(B), so 
0 = (1a, 1g) = Cov(G(1,), G(18)). 


In the informal language of “adding up infinitesimal Gaussian random variables,” we can write G(f) = >> f(x)Zx 


where Z, = N(0, u(dx)), which helps us see that 


xe€E 


Cov(G(F), G(g)) = Cov (= FOZ. 0 s002.) = S> F(x) 9(x)u(dx) 


(only the diagonal terms emerge), which gives exactly the inner product between f and g. But of course, this doesn’t 
exactly make sense without the formal definition. 

However, even when we have this definition, we still need to ask whether we can actually construct such an object. 
The last question on our homework asks us to construct an explicit white noise G and in fact construct Brownian 
motion from that via By = G(1Jo,4,). This Brownian motion will have the right covariance properties, and we can show 
that B is continuous almost surely — this is actually the historically older construction of Brownian motion. 

We'll finish by contrasting all of this with something else we might have seen: compare this Gaussian white noise 
of N(0, 4(dx)) to the Poisson random measure, where for each patch of dx, we assign a Bernoulli random variable 
of parameter (dx). Then taking subsets A C E, we'll get a normal distribution in the white noise case with variance 


L(A), but a Poisson distribution of parameter (A) in the Poisson random measure case. 
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Last time, we talked about the general definitions of finite-dimensional Gaussian vectors, Gaussian spaces, Gaussian 
processes, and Gaussian white noise. Today, we'll go through about the construction of Brownian motion, but we'll 
need to cover a few preliminaries first. 

Recall that if we have a covariance function [ : / x | + R which is symmetric and positive semidefinite, then 
there exists a Gaussian process (X+)te, with covariance function Cov(X;, X+) = ['(s, t). (We will often be working 
with / = [0,00).) We showed this with the Kolmogorov extension theorem — informally, the idea is that to define 
a measure on R’, we need to be able to write down the joint law for any finite subset of / in a consistent way, and 
then the Kolmogorov extension theorem gives us the measure (Be, vy). But in our case, if we're given the values 


of F(s, t), then T yields a covariance matrix for every finite collection of points in /. 


Definition 18 


In the special case where / = [0,00) and ['(s, t) = min(s, t), the resulting Gaussian process (X+)t50 Is called a 


pre-Brownian motion. 


Here’s a quick connection to the material from the end of last lecture: 


Proposition 19 
Let (X+t)t50 be a real-valued stochastic process (any collection of random variables on (Q, F, P) indexed by f). 


Then the following are equivalent: 


¢ X is a pre-Brownian motion, 


+ We can express X; = G((0, t]) = G(1jo,4}), where G is a Gaussian white noise on / = [0, 00) with intensity 


equal to the Lebesgue measure. 


Proof. The backwards direction follows directly from the definition of a Gaussian white noise — G is defined to be an 
isometry, so we do get the correct covariance function. For the forward direction, we're given a pre-Brownian motion, 


and we need to construct an isometry. If f is a step function of the form 


f‘oO= anes tl. 


i=1 


then we define , 


G(f) = ‘> ai(Xt, — Xt4): 


i=1 
We can check that G is an isometry on this class of step functions — specifically, if h(t) = S>;_, b)l{t € (ti-1, ti]} 
(without loss of generality we can assume the two functions have the same break points t;), then the covariance can 


be computed as 


(G(f), G(h)) = > ajbj(t; — ti-1) = (Ff, h) 


because distinct increments of X are independent by the definition of a pre-Brownian motion. So we do have an 
isometry G from the step functions to the Gaussian space H spanned by X, and now we just need to define G on all 
of L*. But the step functions are dense in L?([0,00)), so we can extend G by finding step functions f, that converge 
in L? to a general f € L?([0,0o)). Because f, converge to f in L?, they form a Cauchy sequence (in L2); since G 


preserves distances, G(f,) is a Cauchy sequence in the Gaussian space. Since the Gaussian space is a subspace of L?, 


this means G(f,) will converge in L? to a limit G(f), and this gives us the isometry we want. 


As a reminder, we're going to construct a specific white noise G which guarantees continuity of sample paths. 
The generic definition we have here doesn’t contain such a guarantee, because the Kolmogorov extension theorem 
gives us a process X = (Xz)+>0 which is just some random element of the space (R’, B®!,v) for | = [0,00). This 


sigma-algebra contains events of the form 
{Xn € Ai, Xp € Ao. Xt, € An}, 
where A; are Borel subsets of the real line, as well as an event like 
{Xi=0 Vte Q}, 


which is a countable intersection of the events above. On the other hand, events that are not measurable are things 
like 
{X¢ = 0 Vt E Ih, 


10 


because an uncountable intersection of events need not be measurable, and similarly 
{X; continuous in t}, {X; measurable in t} 


also require us to know about X; on uncountably many values of t. So the probability space isn’t rich enough to 


capture properties like continuity — here is an example of something that can go wrong: 


Example 20 


Let X be a Gaussian process with ['(s, t) = min{s, t} on (R’, B®’, v). We will introduce some additional random- 


ness by augmenting the probability space to include the uniform random variable U ~ Unif[0, 1] independent of 
X. (This means we've moved to a larger space (Q, F, P) = (R', B®’, v) ® ([0, 1], B, Leb), where we draw X and 


then independently draw U.) Now if we sample w’ € R! and u € [0, 1], we can define a new random variable 


Xi (w", u) = Xz(w") + 1{t = u}. 


If we define X:(w’, u) = X;(w’), then X; and X; are closely related — for any fixed t, P(X; = X;) are equal with 
probability 1, because any fixed time t has probability 0 of being equal to u. But X and X aren't the same process, 
and in particular they can’t both be continuous because we add a 1 to one of them at some random time. So X; and 
X, are both Gaussian processes with the correct covariance, but we can’t guarantee continuity even though we have 


the same finite-dimensional marginals. 


Definition 21 
Let (X+)re; and (X;)+e, be two processes. We call X a modification of X if P(X; = X;) = 1 for any fixed time 
t € /, and we say that X and X are indistinguishable if 


{X, # Xz for any tel} CN 


for some measure-zero set N (this is just because the event may not actually be measurable). 


Note that X; and X; above are not indistinguishable, because they will always be different at some time t = u 
with probability 1. The main goal of this lecture is to construct a modification of X which is continuous, but we'll 
have to change the probability space a little bit to do that. The actual construction is actually very concrete: we'll 


first define Brownian motion on [0, 1], looking at the dyadic points 


1 2 201 
Pee. 


These sets Dj are nested in each other (Do C Dy C---), and their infinite union D = Le D, is countable and dense 


in [0,1]. At a very high level, D is a countable dense subset, and we'll only look at the process X on D. Then for any 
other value not in D, we'll define the process using continuity, and we just need to show that we do end up with a 


continuous process. 
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Lemma 22 
Let f : DR be a function that satisfies 


i i-1 K 
I (a) -* (a) S20 


for alln > 1 and 1</< 2"—1 for some constant K and some a > 0. Then f satisfies a similar type of estimate 


for all points in D: we have |f(s) — f(t)| < K’|s — t|* for all s,t € D with K’ = 24... (In other words, f is an 


a-H6lder function on D.) 


(This is completely deterministic — there’s no probability going on here, and note that f is defined on D only.) 


Proof. Without loss of generality, let s < t. Then there is some integer p such that 
1 1 


which means s and ¢ are either in adjacent aE blocks or separated by one block. Either way, let so be the smallest 
point in D, larger than s and to be the largest point smaller than t. Then, the idea is that “the best way to get from 


s to t should use the largest jumps, because the small jumps don’t give a good estimate,” so formally we can write 


n o 


dg € {0, 1} 


(looking at the steps of size s4r, 542, and so on and always taking them if we can), and similarly 


n 
"Ne 
t=to4 spre Me {0,1}. 
=1 


In the worst case, we will need all of these steps, which means that by the assumption in our lemma we have 


K K K’ 
PG =)leet 2) some om 
£>1 


as desired. 


Lemma 23 


Suppose that (X;)¢<jo,1] is any stochastic process satisfying E[|X;— X:|%] < C|s— t|'** for all s, t € [0,1]. Then 


for all a € (0, P , there exists a Ka(w) < co such that 


Ka(w) 
Qna 


|Xijon(w) — X(—1y/20(w)| < 


foralln>1and1</;<2"—-1. 


This lemma basically says that the conditions from the previous lemma hold, except now we have a random K. 


Proof. Let A, be the event {w : |Xj/an(w) — X(i—1y/an(w)| > sz for any 1 < i < 2” — 1} (the numerator of 1 here is 


good enough for the calculations). By a union bound (over all 2” possibilities) and Markov’s inequality, we have that 


_ C 
Qn(e—aq)’ 


1 1+e 
P(A, O28 | jo) — ayn || = BIC ( =) 
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where the middle inequality comes from our assumption. But by Borel-Cantelli, P(A, i.o.) = 0 because P(A,) forms a 
geometric series. So if w lies outside the measure-zero event {Ap i.o.}, then we must have an N(w) such that w ¢ Ap, 


for all n > N(w). That means that with probability 1, the quantity inside the supremum of 


sup { max Xin) — Xp-nyasw)|-2"2} 


n>1 (1si<2"-1 


is at most 1 for all n > N(w), meaning that 


sup { max |Xi2n(w) — X(i-1)/2"(w)| 2m} < max 1 max { max |Xi/on(w) — X(i-1)/2"(w)| -2e\ : 


n>1 [1<i<2°-1 n<N(w) | 1<i<2"-1 


The right-hand side is now some finite number for each w (because the max is being taken over a finite set), and in 


fact that is the Ka(w) from the lemma statement. 


Lemma 24 


Under the same assumptions as Lemma 23, there is a modification X of X whose sample paths are continuous. 


In fact, the sample paths will be a-Hdlder continuous for all a € (0, e). 


Proof. Let E be the event that the estimate of Lemma 23 holds for D, meaning that E is the complement of the 


event {A, i.o.} from the previous proof. Then by Lemma 22, we know that 
|Xs(w) — X:(w)| < Ka(w)|s— t|* 
for all w € E and all s, t € D. Now we define X in the way we said we would (extending by continuity): 


3 lim X>(w ifweeE, 
X(w) = s->t,sED s( ) 
0 otherwise. 


(So if we're in the measure-zero situation where the estimates don’t hold, then X is zero for all time.) But now X is 


an a-Holder continuous function for all w — it satisfies the same estimate with the same k’: 
[Xs — Xz] < Ki (w)|s — t|* 


for all s,t € [0,1] (not just D). We do still need to check that X is a modification of X — in principle, it seems 


like we may have changed a lot from X, because we've ignored its value everywhere except on a countable set. But 


remember that we have the assumption E[|X; — X;|7] < C|s—t|!*®, so as s converges to t, the right hand side goes 


to 0, meaning X; goes to X; in L4% and therefore also in probability. On the other hand, X; goes almost surely to 


X, as s— t for s € D by definition. Therefore X; = X; almost surely by uniqueness of limits, because they are both 


limits of X, as s approaches f. 


The combination of these three lemmas is called the Kolmogorov continuity lemma. We'll take a closer look 


now at one of the bounds 
1\1te 
P(A,) < 27274. C (=) 


that we had in our proofs earlier. Notice that we've used the dyadic partitioning twice here — once to go from s to t, 
where s,t € Dy, are separated by many intervals of x and once in this Markov bound to use the same bound many 


times. A more naive way we could have done our bound is to say that because we have |s — t| = oa for some L and 
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some n, we can consider the probability that |X; — Xz] < a for alls € D, and t=s+ our A union bound here 


would not use the fact that our intervals are overlapping, and we would arrive at the weaker result 


: (L/2")a4 =, Qn(aq-e) 


Ne 
=) <2 


L a 
P(IX.-Xil s (=) for alls € Dy, t=s4+— 


The L!-(¢9-£) term is large here (possibly close to the order of L), and L is possibly on the order of 2-1, so our 
union bound has lost a lot from the overlapping intervals. So the point is that we should be reusing bounds that we 
already have and “taking the largest steps possible!” 

We'll finish this class by applying these lemmas to Brownian motion, finally completing the construction that we 


want. We have 


z (|X; — Xz|9] = E [N(O, |s — t|)9] = |s — t|77 EZ] 


£ 


because X, — X; is a centered normal random variable with variance t—s. | 
q/2-1 
q 


we set q = 1+ €, the above lemmas 
allow us to make a modification to X that is a-Hodlder for a < . a . This estimate holds for any positive gq, 
so in particular if we take large q, this approaches 5. So our modification is just short of 5-Hélder continuous, and 
the obvious question is whether this is optimal. It turns out that the answer is yes, and this is the last part of our 
homework. 

Thus, we have constructed a process with continuous sample paths and the correct covariance! Next time, we'll 


talk about the probability space that is “canonical” for this Brownian motion. 


4 February 12, 2020 


As a reminder, the website for this class contains links to some useful references — in particular, you can find these 
notes that you're reading now. (These shouldn't be considered an official resource, though, since they aren't being 
checked by the course instructors. ) 

Recall from last time that we defined a pre-Brownian motion to be a Gaussian process with covariance function 
l(s,t) = min(s, t), which allows us to define a Brownian motion to be a pre-Brownian motion with continuous 
sample paths. The idea is as follows: suppose (X+)t[0,1] is a pre-Brownian motion on any probability space (Q, F, P) 
rich enough to support such a process. The main content of last class was the Kolmogorov continuity lemma, which 
basically tells us that a modification (B;)+<[o,1; of X exists with sample paths a-Hélder for all a € (0, 5). In other 


words, there exists a Ka(w) < co such that 
|Bs(w) — By(w)| < Ka(w)|s— t|* Vs, t € [0, 1], 


and in particular this tells us that By is also continuous. (We constructed this by only looking at the dyadic set, showing 
continuity there, and then taking limits.) 

Our next question is what happens if we consider a pre-Brownian motion (Xz) ¢>0 for all nonnegative t rather than 
just the interval [0,1]. The idea is that we can just apply the above results for [/,/ + 1] for each integer 7, which 
tells us that a modification (Br)¢>0 exists which is locally a-Hdlder (on every compact interval) and therefore also 


continuous; in particular, B satisfies the definition of a Brownian motion. Letting / = [0, 00), define the set 
C(!) = {continuous functions / + R} C {all functions / + R} = R’. 
The sigma-algebra we can place on R! is Q = B®!, so a natural sigma-algebra to place on C(/) is 
G = Qlewy ={CU) NA: AE Q}. 
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The Brownian motion B now gives us a measurable (exercise) mapping which induces a measure on C(/): we can 
write this as 
B:(Q,F,P) > (C(/),G, P = BzP). 


In other words, for all A € G, we assign it a measure P(A) = P(B~1(A)) where B~? is the pre-image of the Brownian 
motion. While (Q, F,P) is not unique, because there can be all kinds of “extra randomness” in the probability space, 


we do have the following: 


Proposition 25 


The measure P (called the Wiener measure) is unique. 


Proof. \t suffices to show that the value of P is uniquely determined, and it’s enough to check this for a pi-system 


which generates the sigma-algebra G. Consider the probability of a “simple” event like 
P(Bt, € [a1, bi],--- , Be, © [an, byl). 


Events of this type generate G, and we can calculate the probability using the covariance matrix [(t;, t;) (where /, / 


run from 1 to n) to be 
P(N(0, F(t, t) € [] la. bil) 


But this is an explicit value we can find by calculating an integral, and this characterizes the value of P(E) for all 


events E in the pi-system generating G, so we're done. 


Fact 26 

We defined a sigma-algebra G on C(/) by restriction, but here’s another way to characterize it. A natural topology 
to put on C(/) is to say that f, converges to f if f, converges to f uniformly on compact sets; this topology is 
metrizable because we can define the metric 


co 


d(F, a) = 55 min fh up F(t) - aco} 


n=1 


(in particular, d(f, g) < 1 for all f, g). Then d(f,, f) goes to 0 if and only if f, > f locally uniformly, so the metric 
captures the topology. 


Proposition 27 


The Borel sigma-algebra of C(/) in the d-topology, denoted H, is the same as the sigma-algebra G above. 


Proof. First, we show that G C H. It’s enough to show that the events in G are in H; consider events of the form 
(which generate G) 
{Br lar. Dili, Bi, € lan, Dy}. 


This set is closed with respect to the d-topology (if we have a sequence of functions in this set, then any limit point 
will also have that By, € [a;, bj] for each /), and the Borel sigma-algebra contains all of the closed sets, so such sets 
are indeed in H and thus all of G is contained in H. 

To show the other direction, it’s enough to show that an open ball in His contained in G. The set {h: d(f,h) < e} 


is contained in G, because we can measure d(f, g) by looking just at rational t (basically we're saying that while the 
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definition of d takes a supremum over the whole interval [0, n], we can restrict to only a countable set of indices by 


continuity). Thus, this open ball is also measurable in G, soH CG. 


So far, we've been just treating Brownian motion just as a function in t, but now we want to actually make the 


process “move forwards or backwards in time.” 


Definition 28 


Define the sigma-algebras 


brs == or (5,9 SS 40), Gas= [Qe 


Sse hk 


Intuitively, Fy. gives an “infinitesimal amount of information past time t.” For example, if the process B had a 


right derivative limpyo Berk Be then it would be measurable with respect to F;, but not F,. (But we'll see soon that 


Brownian motion does not have a right derivative.) Recall that we've characterized Brownian motion in a way that 
ensures it has independent increments for disjoint time intervals (from the definition of the covariance function); this 


next result is a stronger version of that statement: 


Proposition 29 (Markov property, simplest version) 


Suppose that (Bz)¢>0 is a Brownian motion. Then for any fixed time s > 0, the process (B54+ — Bs)¢>0 iS a 


Brownian motion independent of Fs. 


Proof. Let W; = Bs44 — Bs be our new process. We must show that VW; Is a pre-Brownian motion and that it has 
continuous sample paths — the latter is true because the sample paths are just subsets of the sample paths for B;, 
and the former follows because it has the correct covariance function. Furthermore, because Brownian motion has 


independent increments, we can make the independence statement 


(Wi, Wi. Wz.) IL(Br sos Br) 


for any 4,---,™@< 5s. A pi-lambda argument then tells us that W IL F, as desired. 


Proposition 30 (Markov property, slight improvement) 


Under the same setting as Proposition 29, we have W IL F4. 


Proof. Again, (W,,W,,--- ,W4,) is independent of #4 unless some of the times are 0. To get around that, note 
that we can write 
(Wi,Wes e+ Wa) = lim (Wee: Werte, ene We iee ) 


by continuity of Brownian motion. For any fixed € > 0, this is independent of F,,, so the independence holds in the 


limit as well (because the limit is measurable with respect to those variables). 


This leads us to the following consequence: 


Theorem 31 (Blumenthal 0-1 law) 


For any event A € Fo,, we have P(A) € {0, 1}. 
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In words, A “only sees the Brownian motion at an infinitesimal time at the beginning,” and the idea is that we can't 


produce something nontrivial that depends on this infinitesimal time. 


Proof. By Proposition 30 with s = 0, we know that o(B, : s > 0) is independent of Fo,. But Fox sits inside 
o(B; : s > 0), so Fox is independent of itself; thus for any A € Fo,, we have P(A)? = P(A) and therefore 
P(A) € {0, 1}. 


This gives us a bit more to work with: 


Proposition 32 
Let B be a Brownian motion with Bp = 0. Then the following hold: 


1. B will cross 0 an infinite number of times almost surely. In other words, for all € > 0, we have almost surely 
that 
sup(B;:s €[0,€]) >0, inf(B;:s € [0,e]) <0. 


2. For any a€R, the hitting time 7, = inf{t : By = a} is finite almost surely. 


For example, this tells us that it doesn’t make sense to define “the first time B returns to 0.” 
Proof. For (1), for any € > 0 define the event 
Ae = {sup(Bs : s € [0,€]) > 0} 


and let A= [eso 


any é€, and similarly this means A = (),, Az/n is also measurable. Now A is in Fo, because the events A,/, € Fy, are 


A-. We can restrict to only looking at time-values s which are rational, so A, is a measurable set for 


decreasing and nested. So by the zero-one law, P(A) is either 0 or 1, and we want to show that it is 1. However, for 
any € > 0, we have P(A,) > $ (because Brownian motion is symmetric around 0). Since these events are decreasing 
as € — 0, in the limit we have 

P(Ae) = &? P(A) ee => P(A) =1. 


(The other statement follows analogously.) Now (2) is actually a consequence of (1), since taking ¢ = 1 tells us that 
1 = P(sup(B; : 5 € [0, 1]) > 0)) = lim P(sup( Bs :s €[0,1]) >6) 
by continuity. Now we can rescale space by ; and rescale time by i the result is still a Brownian motion, which 


; 1 
1 = tin? (sup (6.:5¢ lo. 53] ) >1) =P (sup. > 1) 


again by continuity because z goes to infinity. This means that Bg will hit height 1 at some point with probability 1, 


means that 


and we can scale again to show that B hits any height a almost surely (for instance, we can multiply space by 10 and 


multiply time by 100). 


In particular, if we run a Brownian motion for all time, it will not converge to anything, since it must hit height a, 
then height —a, and so on. That means it oscillates a lot, and thus Brownian motion is not particularly well-behaved 
compared to other processes we might have seen. To answer the question of “how regular Brownian motion is,” we'll 
choose a few interesting results for this class that are interesting but skip the more specialized ones. Our first result 


will be important to stochastic calculus: we know that Brownian motion is (5 — €)-Hdlder, and on homework 2 we'll 
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see that 


|Br+n — B;| 
lim sup ———————. = las 


Alo 4/2hlog (1/h) 


This tells us that the 5 exponent is basically correct, and this should generally make sense — we have that 
|Brth — Bel ~ VA|N(0, 1)], 


so we should expect the increment to be on the order of Wh (and the above statement shows that sometimes it is 


bigger). 


Definition 33 
A function f : [a,b] > R is of bounded variation (BV) if (taking a supremum over all sets of break points 


between a and b) 


sup SS [Ca = (ta) =< ce. 


PN tis: aa 


For a partition p = {t)}, we'll define V.'(f) = So7_, |f (ti) — f(ti-1)|. Sufficiently nice functions will always be of 


bounded variation — for example, a C! function f is of bounded variation because 


wins [ir(olde 


However, Brownian motion is not of bounded variation — we can read this in the book or try this ourselves. (If we 
partition our interval [0, 1] into blocks of size €, we have + intervals and each increment is on the order of ./e, so this 
gives us something very large.) So we'll need a more suitable measure of variation instead. For instance, consider an 
a-Holder function f (meaning that |f(s) — f(t)| < K|s — t|%). Then one way we could measure its regularity is to 


consider 
n 


n 
Vilar) = S- Fle) = rea? < :> KOE, —ti|< KUe(b— a) < co 


i=1 i=1 
(by the a-Hdlder bound). This means that Brownian motion on a compact interval satisfies V/*(B) < oo for all 
a< 5. But this is not a good measure of variation either: because we're raising |t; — t;_1| to a power greater than 
2, the sum tends to 0 as our partition gets finer. Specifically, for any fixed a € (0, 5). we know the Brownian motion 


is actually y-Holder for y € (a, 3), so taking P to be a set of break points separated by €, we find that 
1 
1 x Va -1 
vi/e(B) x ae) eat daa 


which tends to 0. So raising the increments to any power larger than 2 doesn’t work — this motivates raising to exactly 


the second power: 
n 
2 2 
Vp (B) = S0(Bs — Bra)’. 
i=1 
This turns out to actually tend to a non-trivial limit: 


Proposition 34 


Suppose that P, is a subdivision of the interval [0, t] for all n. Then Vg (B) converges to t in L? as the mesh of 


P, goes to 0. 


Note that we have L? convergence but not almost sure convergence. 
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Proof. This is a simple variance calculation. First, break up the t term into the lengths of our intervals as 


n 2 
\IV2,(B) — t|]b =E (16. — By,)? - (ti- my 


f=1 


Each term has mean 0, because the expected value of each squared increment is the length of the interval. In addition, 
the increments are independent, so if we expand the square and take expectations, the cross-terms go away and we're 
left with 


n 


\IV3,(B) — 3 = So (ti - t-1)PE[(Z? - 1°] 


i=1 


where Z is a standard normal. Now the expectation term is just some constant, meaning we can bound this L? norm 
from above by E[(Z? — 1)?] - mesh(P,) <7. (ti — ti-1) = E[(Z? — 1)?]- mesh(P,,)t, which goes to 0 as n > oo by 


assumption. 


This result is important, because the limit Ve (B) has to do with sum of squares of Brownian motion on small 
intervals, which is a good way of understanding an expression like ids.) (as we'll do later on in the class). 

For our next result, consider the natural filtration F; = 0(B, : s < t) as before, and define F,, = 0(B; : s > 0) to 
be the sigma-algebra of “everything we know about the Brownian motion.” Now let 7 be a stopping time (meaning 


that {7 < t} € F; for all t), and define the stopping-time sigma-algebra 
Fr ={A€ Fa: AN{T < the F}. 


In words, #7 captures everything we know about the process up until our random stopping time. 


Proposition 35 (Markov property, strong version) 


Let T be a stopping time. Then (B74 — Br)¢ is a Brownian motion, and it is independent of 7; under the 


measure P where we condition on 7 being finite. 


We can read the proof in the book (it involves approximating the vector of Brownian motion values with dyadic 
rationals). One of the most important applications for this is the reflection principle: suppose we want to compute 
the value 

P(B; > a for some t € [0, 1]). 


The idea is to consider a height b < a and ask about the probability that we do exceed a but end below b at time 
1. To answer that question, we can reflect the Brownian motion after the stopping time when we hit a, which shows 


that we just need to end above a — (b — a) = 2a — b. In other words, if S; = sup(Bs : 5 < ft), 
P(S; >a,B, < b) = P(S; >a, By >2a—b)=P(B, > 2a—b). 


But By is just a standard normal, so this tells us everything we might want to know about the supremum process! 


We'll cover this in more detail next time. 


5 February 18, 2020 


At the end of last time, we discussed the reflection principle briefly, and we'll elaborate on that discussion now. Recall 


the strong Markov property for Brownian motion, which tells us that for any stopping time 7, (Br++¢ — Br)etso is 
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a Brownian motion independent of Fy (the stopping-time sigma-algebra) if we condition on T being finite. This is 


useful, for example, if we want to consider the supremum process 
S: =sup{B,:0<s< t} 
which is nondecreasing In ft. 


Theorem 36 


For any a> 0 and b€ (—cx, a], we have 


P(5, 2 4.8; = b) =P(B, > 2a— by 


The point is that this gives us a Joint distribution between S; and B; in terms of something that is completely 


explicit, since By is just a normal random variable with mean O and variance t. 


Proof. Apply the strong Markov property to the stopping time 7 = inf{s > 0: B, > a} (the first time we hit a) to 
see that W; = (Br4s — By)s>0 is a Brownian motion independent of the process up to time 7. Now we can explicitly 
write down 

P(S; > a, By < b) = P(t < t,Wi_, < —(a— b)) 


because both sides tell us the probability of hitting a at some point and then going down by (a— b) to get back below 


b. Using the reflection principle and noting that W is symmetric in law, the previous probability can also be written as 
P(t <t,Wi_-, > a— b) = P(S; > a, By > 2a— D). 


But By; > 2a— b means in particular that S; > By > a, so we can drop the S; > a assumption and get the stated 


result. 


a2 
~ @a0b 


we have S; > a but B; < b), and this density is supported on {a > 0, b < a}. And now we can take this density and 


In particular, this means the joint density of S; and B; is given by P(B; > 2a — b) (negative sign because 


integrate out the b to find the marginal law of S;, but there's a faster way: decompose as 
P(S; > a) = P(S: > a, By > a) + P(S: = a, Be < 2) = P(By > 2) + P(By > 2a — a2) = P(|B;| = a), 


because the law of By is symmetric. So this means that for any fixed t, S; is equally distributed as |B;|. However, 
do note that the processes 
(Sr)eo0, (|Bel)e20 


are not identically distributed, because S; is increasing while By returns to O infinitely many times. 


Remark 37. On our homework, we'll discover a bit more: it turns out that S; and |B,| are both equally distributed as 


S+ — By for any fixed t, and (S; — B;) is also equally distributed as |B;| as a process. 


Note that this calculation also gives us the law of the first time we hit a: if o, = inf{t > 0: By = a}, then 
d. 2 
og =inf{t > 0: By =1} = aro, 
but we can get the exact distribution via the calculation 


(os > t) =P(Se < a) = PUB < a) =P (IZi< >) -r(S=0). 
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where Z is a standard normal. Comparing the left and right sides, we see that o, is equally distributed as oa so in 


particular E[a,] is infinite for all a > 0. Taking a= 1, this means that oj Is distributed as a: We can then calculate 
the density of a1 explicitly (left as an exercise), and find that for large t, 
1 
Pio >t)x —, 
( ) ve 
which decays quite slowly. But now we can compare this with the 7 = inf{t : |Bz| > 1} (so now asking for the first 


time we hit either 1 or —1); it will turn out that 
P(t > t) < exp(—Q(t)), 


which decays much more quickly. To understand this exponential decay intuitively, suppose that we run Brownian 
motion for a very long time and we want it to stay confined in the interval [—1, 1]. Looking at successive intervals of 
length 1, there is always some positive chance it leaves in each time interval and conditioned on staying inside [—1, 1] 
this probability is bounded from below. But again, we'll be more precise on the homework. 

This is all we'll cover from chapter 2 of Le Gall — now we'll move on to chapter 3, which discusses continuous- 
time martingales. Unfortunately, it's pretty boring: under mild assumptions, all the properties from discrete-time 
martingales hold. So we'll go through this fairly quickly — if we took 18.675 in a different semester where we didn’t 
cover martingales in such detail, we might have to do reading on our own. 

Brownian motion By is an example of a continuous-time martingale, and here’s another example to keep in mind 


as well: let ¢, ¢; be tid exponential random variables, and let 
n 
Ne= max : pe < i. 
i=1 
Then N; is distributed as Pois(t), and it is an integer-valued process with right-continuous sample paths (but discon- 
tinuous jumps). It will turn out N;—t is a continuous-time martingale as well, so our formalism should be able to study 
it. (Generally, we'll be assuming that the processes we are studying are right-continuous.) Throughout this chapter, 


everything will live on a probability space (Q, F, P) with a filtration 
(Fi)o<t<oo: FsOF, Vs<t. 


We'll call (Q, F, (F), P) a filtered probability space. 


Definition 38 


A process (X;)¢>0 is adapted to a filtration F; if X; € F; for all t. 


We'll also reiterate the following definition from earlier: 


Definition 39 
A random variable T : Q — [0, oo] is a stopping time if {7 < t} € F; for all t (that is, whether we stop doesn’t 


depend on what happens in the future). The o-field of the past up to 7 is 


ea — VA Coe ee ANGE Sata aramnvit = 


So an event that only depends on time up to T can be rephrased as “only needing information up to t if T < t.” 


We should read all of the basic facts about filtrations and stopping times on our own (sections 3.1 and 3.2 of our 
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book). An example of what we'll see is that if 0, 7 are both stopping times, then their minimum o A 7 and maximum 
ao V T are also stopping times, and 
Font = Fa Fr. 


(This specific fact is useful for one of the questions on our homework.) 


Definition 40 
Let (Q,F,(F)+,P) be a filtered probability space, and let (X+)¢>0 be a real-valued process. Then (X;) is a 


submartingale if the following properties hold: 


+ Xt € F; for all t (in other words, the process is adapted to the filtration), 


+ E[|X¢|] is finite for all t (the variable is integrable), 


+ For every 0 <5 < t, we have X, < E[X;:|F.]; we similarly say that (Xz) is a supermartingale if we flip the 


inequality. 


A martingale is both a submartingale and a supermartingale, meaning that E[X;] is constant for a martingale, 


nondecreasing for a submartingale, and nonincreasing for a supermartingale. 


Example 41 


Let Z € L*(Q, F,P) be any integrable random variable. Then we can check that X; = E[Z|F;] is a martingale. 


The fact that X,; = E[X¢|F.] follows from basic properties of the conditional expectation, and each X; is integrable 


because 


[|X|] = E| 


{[Z|F ill] < E (El Z||Fe]] = E[|Z|] < 00 


1G 


by Jensen's inequality. 


Remark 42. The standard Brownian motion By is a martingale (it satisfies all properties in the definition), but there 
is no random variable Z such that B, = E[Z|F;] because E[|B;|] = t'/2E|N(0, 1)| is unbounded as t — oo. 


There are some other important martingales based on Brownian motion as well: we can check that B? — t is 
a martingale, as is exp (08: — et). We'll use the rest of this lecture to prove some basic results about general 


martingales: 


Proposition 43 


Let X; be a (sub/super)martingale. Then for any t < co, we have 


sup {E[|X;|] :0<s< t} <oo. 


We didn't have to prove this in the discrete case, because we only had a finite number of variables to consider 


between 0 and t, and we know that E||X.|] is finite at any given time s. 


Proof. Without loss of generality, say that X is a submartingale. Then (X+)4 = max{X;z,0} is a submartingale 


(because f(x) = max(x,0) is a convex nondecreasing function), so E[(X+)+] is nondecreasing. Therefore, for any 
s<t, 


|X|] = [2(Xs)4 = Xs] < 1[2(X¢)4 = Xo]. 


This bound holds uniformly over s, so the supremum of E[|X,|] must indeed be finite. 
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This next fact is a weak version of the optional stopping theorem for discrete-time submartingales, and we saw 
this in 18.675: 


Lemma 44 


Let X, be a discrete-time submartingale, and let T be a bounded stopping time such that 7 < n almost surely. 


Then E[Xo] < E[X7] ss [Xn]. 


Proof. Consider the stopped process Y;x = Xz,a7. This is also a submartingale, and 


[Xo] = “1Yo] < “Yn = Xnaz] = 1X7] 


because T < n almost surely. For the other inequality, write 


n 


[Xe] = SO E[L{7 = k}Xu] < SO E[L{7 = KE[Xn|Fi]] 
k=0 


k=0 


by the submartingale condition, and now we can put 1{t = k} inside the conditional expectation because it is 


measurable with respect to F,. Thus, 


3 


[X_] < So EIE[L {7 = k}Xq| Fel] = SO E[L{ 7 = k}Xn] = E[XG]), 
k=0 k=0 


as desired. 


Similarly, we also have the following result: 


Proposition 45 (Maximal inequality, discrete version) 


Let Y, be a discrete-time (sub/super)martingale. Then for all > > 0, 


-P (max Vel > r) < E[lYol + 21%all. 
O<k<n 


This tells us that we have control of the entire trajectory up to time n just by knowing something about the process 


at the beginning and end. 


Proof. Without loss of generality, assume Y, is a supermartingale. Let A be the event that maxo<k<n|Y«| > A, and 
consider the stopping time 
T=min{k:|¥;,| >A or k =n} € {0,1,--- , nf. 


Recall that the notation E[X; A] means E[X - 1{X € A}]. We have 


> <E ; 
AP (.mox i > r) < EflYz|; Al, 


because |Y;| > A whenever the event A occurs. The right-hand side can now be decomposed as 


a(lY7|; A] < EllY7|] =E[Y, + 2(¥7)-], 


and now since Y, is a supermartingale and (Y;,)_ is a submartingale, we can upper bound this quantity by E[Yo9+2(Y,)_], 


which is at most the right-hand side. 
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Remark 46. We can get a small improvement in Proposition 45 if we assume that X is a martingale. Then we can 


do the decomposition before removing the indicator on A: 


AP(max |X] 2 A) S El|Xr|; A] = E[(X7)4 + (Xr)-s Al, 


and because X is a martingale, (X;)+ and (X,)_ are both submartingales, so this is actually bounded by E||Xp|; Al. 


We can now generalize this result to continuous-time martingales: 


Proposition 47 (Maximal inequality, continuous version) 


Let X; be a (sub/super) martingale with right-continuous sample paths (this is a regularity condition). Then 


\P (sup IXé] > a) < E[|Xol + 1% I] 


s<t 


Proof. Fix t. Then any sequence 0 = to < ty < to <--- < tm = t gives a discrete-time (sub/super)martingale (X¢, )x, 


so by Proposition 45, we have 


a ( max |X;,| > ») < E[|Xo| + 2|X;|]. 
o<k<m 


Now consider a sequence of such time sequences Dj, t D, where Dy = {tp = 0, ty = t} and the Ds are nested and 


increase to a countable dense subset D in [0, t]. We find that 


re( sup |X| 2 s) < E[|Xo| + 2|Xzl], 
s€[0,t]nD 


and right-continuity allows us to replace [0, t] M D with [0, t], yielding the desired result. 


Next, we'll prove a few more inequalities that are weaker but easier to package and remember: 


Proposition 48 (Doob’s L® inequality, discrete version) 


Let X, be a discrete-time martingale. Then for all p > 1 and finite n, we have (letting C, = = 


Pell areal: aie 
de | | ae pllXnllp 


Proposition 49 (Doob’s L? inequality, continuous version) 


Let X; be a martingale with right-continuous sample paths. Then for all p > 1 and finite t, we have (letting 


sup |Xs||}_ < Cpl|Xellp. 
<s<t p 


0 


Here, the discrete version again implies the continuous version by the same argument as above, so we'll just prove 


the discrete version. 


Proof of Proposition 48. With the same event A as before, we take the inequality 


AP (maxx > r) < E|X,|; A] 
k<n 
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from Remark 46. We wish to bound the L? norm of Sp = maxo<k<n|Xx|, which we will do using the formula 


[(Sn)?] = J P(S;, 2 t)dt = fe py? *P(Sp = y)dy 


(last step by a change of variables t = y?). Plugging in the boxed inequality yields 


[(Sn)?] < [ py ae . 2) dy, 


and by Fubini’s theorem, we can change the order of integration to rewrite this as E ka ia py?ay|, where the 
indicator that S, > y is accounted for by replacing oo with S,. Integrating directly and then using Holder's inequality, 
we find that 


3 _ p : p . 
El(So)P] < 5P FE [IXnl SB] <P 5 Xolloll(Sn)? Tle, = 5 lIXnllell Sol. 


We can now divide through by Sallb to get the desired result if the norm Is finite, and otherwise we use a standard 


truncation argument to finish the proof. 


However, it’s important to remember that there's no L? inequality for p = 1. For example, consider the simple 


random walk X, on the integers starting from Xo = 1, let T be the first time n such that X, = 0, and let Mp = Xnpz.- 


This yields a nonnegative martingale with E[M,] = 1, but we can't control the maximum of this process — indeed, if 


we define S, = maxgk<n Mx, then ||Sp|| is unbounded as n — oo. To see this, it’s enough to show that ||S.o|| has 
infinite expectation — the probability that S,, is at least a is the chance that a random walk hits a before 0, which is 
4 (by the optional stopping theorem for discrete-time martingales, for example). This is not summable over a, so the 


expectation Is indeed infinite. 


6 February 19, 2020 


Yesterday, we defined what it means for (X+)¢>0 to be a continuous-time (sub/super)martingale on a filtered probability 
space (Q, F, (Fz), P). We then showed the maximal inequality 


xP ( sup Xs) <B[IXol +21Xal 
O<s<t 

assuming that the process X has right-continuous sample paths. (There’s a slightly stronger result for martingales, 
which is that AP(A) < E[|X;|; A] for an event of the type A = supo<.<¢|Xs| = A.) We also used this to show the L? 
inequality by integrating the previous result to find that 


sup X, 


p 
< ——||X 
ae < =z) tllp 


p 


for all p > 1 and t € [0, co). This requires right-continuous sample paths — if we don’t assume that fact, we actually 


only prove that 


( sup |Xs| >») < E[|Xo] + 2|X¢l] 
s€[0,tq]nD 


for a countable dense set D. We'll build off of this today: in particular, we'll show that under mild conditions, a 


(sub/super)martingale has a right-continuous modification. 
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Definition 50 
A function f is right continuous with left limits (also rell or cadlag for short) if for all ¢ > 0, f(t) = limgy: f(s), 


and for all t > 0, limes f(s) exists. 


The main idea is that martingales can’t oscillate too much, so we can guarantee existence of limits. We'll start 
with a deterministic result by controlling upcrossing numbers: for any subset / C [0,co) and any a < b, denote 
UE (1) to be the maximum k such that there exist times 5; < ty < 52 < to <-+: < S < t,x suh that f(s5;) < a and 
f(t;) > b for all 7. (In other words, this is the maximum number of times that we go from a to b.) Basically, if we 


can control upcrossing numbers, we have some regularity control: 


Lemma 51 
Let D be a countable dense subset of [0, 00), and suppose we have a function f : D > R that is locally bounded, 
meaning that sup{|f(t)| : t € [0,7] D} < for all T € D. Also, suppose that Uf ,(D Nn [0, T]) < oo for all 
T € D and for all rational a < b (to avoid issues with measurability). Then f has all of its left and right limits, 
and the function 

g(t) = F(t+) = Nim F(s) 


is rcll. 


Proof. Take any t > 0, and suppose for the sake of contradiction that the (WLOG) right limit limsy:sep f(s) does 


not exist. This means that the limsup and liminf of this limit are different, so there exist rational a, b with 


liminf f(s) <a< b< limsup f(s). 
s{t,sED s{t,seD 


But this means the function f must cross between a and b infinitely many times, which is a contradiction with the 


assumption that Uf ,([0, 7] M D) is finite. Showing that the function g is rcll follows from a similar argument. 


This basically tells us that we need to control upcrossing numbers, and we can do so using the following idea: 


Definition 52 
Let X, be an adapted discrete process, meaning that X, € F, for all n, and let H, be a previsible process, 


meaning that H, € F,_1 for all n. Then the Doob transform is defined via 


(H i X)n = ‘2 Hk (Xx = Xx-1). 
k=1 


Notably, if X is a supermartingale and H is a nonnegative bounded previsible process, then H- X is also a 
supermartingale. We can check this from the definition, but what it's really saying is that if there is a gambling system 
X where we can’t win, even with a betting strategy H (to tell us how much to bet in the next game), we can’t game 


the expected gain H- X. 


Lemma 53 (Doob’s upcrossing inequality) 


Let X, be a discrete supermartingale. Then the expected number of upcrossings satisfies 


i[(Xn — a)-] 
b-a ; 


y [ux,([0, n})] < 
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Proof. We'll study the value Y we get off of betting on X only during upcrossings. More formally, let Y = H- X, 
where 


Hj = 14 € (aj, Ti]} for some / 


and where o; and 7; are stopping times corresponding to the /th time X, Is below a and above b, respectively. Note 
that H; is in Fj-1, because 
{i € (01, Ti} = fo; < fF -— 1} N{7; <f/— 1}, 


and both of these events are F;_1-measurable by the definition of a stopping time. Thus Y = H-X is a supermartingale 


with Yo = 0, so E[Y,] < 0. On the other hand, we know that Y, gets a contribution from the number of completed 


up-crossings, and then there's an extra term from the end where we start an up-crossing but we go very far down at 


the end. In the worst case we can lose (X;, — a), so 
Ya 2 (b= a)U5,(10,-0]) = (Na 2). 


Taking expectations of both sides, we have 


(Xn — a)-], 


0 > E[Y,] > (b— a)E[UX,([0, n])] -E 


and rearranging gives the desired bound. 


Corollary 54 


Let X; be a supermartingale and let D be a countable dense subset of [0,00). Then there is a probability-zero 


event N such that for all w ¢ N, the function t+ X;(w) satisfies Lemma 51. 


Proof. The first property (locally bounded) follows by the maximal inequality 


2°( sup |X| 2 ) < E(|Xo| + 2|Xz|) < 00 
s€[0,t]ND 

and taking >» — oo (which shows that we cannot have a positive probability of going off to infinity). The second 
property (number of upcrossings is bounded) follows from Lemma 53 plus an approximation argument. Indeed, for any 
finite subset D, of D, Lemma 53 tells us that 


(Xt — a)-] 
b-a 


t [U%s([0. t] 9 Dn)] < 


because X+¢ IS a Supermartingale. Now taking D, nested and increasing to D should yield an increasing number of 


upcrossings, but we'll always be uniformly bounded by the right-hand side, so even in the limit we must have finitely 


many upcrossings. 


Remember that our goal is to turn our process X;¢ into a modification x such that P(X; = X+) = 1 for all 
t. What we've proved so far suggests that we should do this by taking limits from the right. However, while we 
know that t+ X;(w) has left and right limits, we still need to check that taking limits from the right to get an rcll 
function actually yields a modification of Xz. For example, let f be a deterministic nonincreasing function and let 
X, = f(t). Then this is a supermartingale, but if f is not right-continuous, then there’s no way for us to modify it to 
get right-continuous sample paths. 


So from here, we'll need to use two facts: first of all, we showed last time that if X; is a (sub/super)martingale, then 


sup{E[|Xs|] : 0 < s < t} is finite for all finite t. Also, we'll need some theory of backwards (sub/super)martingales, 
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which are indexed by Z<o instead of Z>o. In such a process, we have sigma-algebras F_3 C F_2 C F_1 C-:-, anda 


supermartingale now satisfies an inequality of the form Y_i9 > E[Y_9|F_10]. 


Proposition 55 


If Y, is a backwards (sub/super)martingale and sup,[|Yn|1] < oo, then Y, converges to a finite limit Y_. almost 


surely and in L?. 


Proof. If (--- ,¥-3, Y-2, Y-1, Yo) is a backwards supermartingale, then we can apply the Doob upcrossing inequality for 
any finite n € Ze<o to find 
E[Xo — a)] 
E [UY 4([n, 0])] < tie 


Now we can take the limit as 7 + —oo, and the total number of upcrossings will be uniformly bounded by the finite 
quantity on the right-hand side. This means that for all rational a, b, UY ,([—00, 0]) < co, almost surely, so Y, must 
converge almost surely to a limit Y_. (or else it would oscillate between two rational numbers infinitely often). The 


L? convergence is a uniform integrability argument, which is trickier for supermartingales than for martingales (which 


we did in 18.675) — we should read this on our own. 


Theorem 56 
Let X; be a supermartingale and let D be a countable dense subset. Then there is some probability-zero event N 
such that Xz(w) has left and right limits for all w ¢ N. Furthermore, 


Y,(w) Xt4(W) =limsyt sep Xs(w) if the limit exists, 
t\W) = 
otherwise 


is a supermartingale with the filtration Ge = F:4, and X; > E[Y:|F;] with equality if the map t > 


right-continuous (this is the mild condition). 


Proof. The first part (left and right limits) follows directly from Corollary 54 and Lemma 51. To check that Y; is a 


supermartingale, It’s clear that Y; € G; by the limit definition, and for any s, | t, Xs, is a backwards supermartingale. 


This backwards supermartingale is bounded in L?, because the supremum of E[|X,|] is bounded on finite time intervals, 


so we can use Proposition 55 to show that Xx, converges almost surely and in L' to Y; — in particular, Y; is indeed in 


L? (which shows integrability). 


Next, the supermartingale condition for X; shows that X; > E[Xs,|F;r], and Xs, converges in L' to Y;, so this 


converges to E[Y;|7:] as k — oo, showing the desired inequality. Finally for the equality case, if t — E[X;] is right 


continuous, then E[X¢] = lims)+ E[X;]. We can switch the limit and expectation by the L! convergence, so we in fact 


have 


[X;] = lim x. = E[Y,]. 


This means that we know both that X; > E[Y;|F;] and that E[X;] = E[Y;], but these can only hold if the former 


inequality is actually an equality. Thus X; = E[Y;|7;] almost surely as desired. 


Finally, we still need to show that Y; is actually a supermartingale. Let s, | s and t, | t be chosen so that s < t 


and s, < ty for all n. Then for any event A € G,, we have 


E[Y.: Al] = lim E[X,,: A] 
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by the backwards supermartingale L+ convergence, but now we may bound this from below by limg+oo E[Xz,; A] = 


E[Y:; A] |, where in the last step we've again switched the limit and expectation. This shows that we do have a 


supermartingale. 


We can now put everything together for the main result: 


Theorem 57 


Suppose we have a right-continuous filtration (meaning that 7; = Fz, for all t) and assume that F; is complete 


(meaning it contains the null sets). Let X; be a supermartingale with right-continuous mean E[X;]. Then X 


has an rcll modification X which is also a supermartingale with respect to Fy. 


Proof. We define 
= Yi(w) ifwE€QN, 
Xt(w) = ‘ 
0 otherwise, 
where NV is the set from Corollary 54. (This means that we're allowed to look into the whole future and see if things 
go wrong — it’s just an issue with measurability.) So now X, © F; because F; is right-continuous and complete, and 
it is also a supermartingale because Y; is a Supermartingale and X, = Y¢ almost surely. 


It remains to show that X is a modification of X. By the last part of Theorem 56 (using X instead of Y), we 


have X; = UX +|Fe] because the mean is right-continuous, but ee is measurable with respect to Fr4 = Fy and thus 


.— x almost surely, as desired. 


Basically, with a sufficiently rich filtration and with the mild condition that the deterministic function E[X;] is 


right-continuous, we get some nice results. 


Remark 58. /t’s necessary to assume that E[X;] is right-continuous: as mentioned above, a counterexample otherwise 


is the deterministic process X; = f(t) for a non-right-continuous f. It’s also necessary to have the assumption of 


right-continuity: consider Q = {+1}, let P be the uniform measure on Q, and define X;(w) = wi{t > 1}. In words, 
this means that X starts off as 0 and then jumps to a random bit at time 1, so it is a martingale. However, the 
filtration generated by X is trivial until t = 1 and then jumps to the complete sigma-algebra, so the filtration is not 


right-continuous, and indeed there is no modification of X that is rcll. 


Next time, we'll talk about the optional stopping theorem for continuous martingales, and that will be all from 


chapter 3. 
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Last time, we discussed sample path regularity for continuous-time (sub/super)martingales: it was somewhat technical, 
so let's review the main points. If D is a countable dense subset in [0, co), we found that for any (sub/super)martingale 
Xe, tts left and right limits 
Xter(w) = lim  Xs(w Xe-(w) lim Xs(w 
t+(w) s|t,seD s(), r—( Pu s(w) 


exist, and if the mapping t > E[X;] is right-continuous, we actually have X; = E[X+4|F:]. Then if Fy is right- 


continuous and complete (contains the null sets), then X has a modification x satisfying X= Xe. except on a null 
set, and this modification has sample paths which are rcll (right-continuous with left limits). This will allow us to 


generally assume right-continuous sample paths in most of our future discussion. 
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Today, the discussion will focus on optional stopping theorems, primarily for martingales. The main feature is 


that we can extend the statement X,; = E[X;|F.] for fixed s < t to random times; our goal is to prove that if 


ao <7 are both stopping times, then X, = E[X;|F,]. There are lots of applications of this, many of which are on our 


homework, but here's a simple example: 


Example 59 


Let B; be a Brownian motion started from 0, and let T = T, A Tp for a < 0 < b. Then an optional stopping 


theorem will tell us (because Bz is a martingale) that 


0— Bo — E|Bs|— ek Ge < th) OP te Sp) 


allowing us to explicitly calculate the probability p = a that we hit a before we hit b. 


Note, though, that optional stopping theorems do not hold without further restrictions: 


Example 60 


Let 7 be the first time that Brownian motion hits 1. We've shown previously that 7 is finite almost surely, but 


0 = By E[B,] =1. 


We do have a reason to believe that the optional stopping theorem should continue to hold, though: we know that 


if Xz is a martingale, then E[X;] is constant in t, and if 7 is a stopping time, then X¢,7 is a martingale. (There is a 


small caveat: we know the stopped process is a martingale in discrete time, but we haven't actually proved this for 


continuous time yet.) But then this means E[X;,,] is also constant, so 


[Xo] = ’[Xonr] = [Xtar] 


for all finite t. And then taking t + oo should yield (for any finite stopping time 7) that 


[Xo] = im ElXtar] = E [ Lim Xear] = EV%r 


so we've reduced this to the usual L? convergence question: “can we swap the limit and expectation?”. We'll start 


with an almost-sure (pointwise) convergence result: 


Proposition 61 


Let X; be a (sub/super)martingale with right-continuous sample paths, and suppose that sup; ||X¢||1 < co (mean- 


ing our process is bounded in Lt). Then X; converges almost surely to X,, € L! (though L+ convergence may 


not occur). 


Proof. Without loss of generality we can assume that X; is a supermartingale (otherwise multiply it by —1). Then we 


know by the upcrossing inequality that 


s(Xr — a)-] 
b-a , 


[U2,o([0, 7] D)] < 


and by the monotone convergence theorem, because the left hand side is increasing as we take T — oo, we have 


eX 
2[Us.0(D)] < sup < 00 
t 
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This means that Up Is finite for all rational a < b almost surely, and thus the same is true over all rational < b. In 
particular, this means X; must converge (or else there would be a rational sandwiched between the liminf and limsup). 


Then Fatou’s lemma tells us (because X. = liminftso Xz) that 


{[|Xool]] < lim inf E[|X¢|] < 00, 
too 


so X,, is in L! as desired. 


Again, it’s important to remember that X; does not always converge in L! to X,, under these conditions. Another 
example that is good to keep in mind is 
Xt = exp(B; — t/2), 


which is a nonnegative martingale with E[X;] = 1 for all t. However, Bz will be much smaller than 5 as t > oo, SO 


X+ converges almost surely to X, = 0. 


Proposition 62 


Let X; be a (sub/super)martingale with right-continuous sample paths, and suppose we know that sup; ||X¢||p < 0co 


for some p > 1. Then X; — Xoo almost surely and in L?, and in particular it also converges in L?. 


Proof. This proof is the same as in the discrete case. Recall Doob’s L? inequality, which tells us that 


p 
sup |X. < ——||X¢l|p. 
sup Xi] <5 a IP%ell 


p 
Taking t — oo, the left side is nondecreasing in t, and the right-hand side stays bounded by assumption, so S = 
SUPtso0 |Xz| is in L?. The conditions assumed here are strictly stronger than in the previous proposition, so we know 


already that X, > X.. almost surely. But now by the dominated convergence theorem, we have 


fin, 


Xt -— Xo0|?) + 0, 


since |X¢ — Xoo|P is dominated by (2S)?, which we've shown is in L?. 


This will help us with some but not all of the cases we're interested in, and in fact we have a precise characterization 


of when L! convergence occurs. (And the proofs now will be a bit more complicated than in the discrete time case.) 


Definition 63 


A collection of random variables {X;}jc¢; is uniformly integrable (u.i.) if 


lim (suv Xi; Xi] = mM) > 0. 


M-oo \ je] 


As a trivial example, let Z € L1(Q, F,P), and assume that all of our random variables satisfy |X;| < Z. Then the 


|X;|s are uniformly integrable because E[|Z|;|Z| > M] goes to 0. A less trivial example is to consider the collection of 


random variables 


Xg =E[Z|G], 


where G is any sub-o-field of F (showing this is uniformly integrable is a good exercise). Finally, note that being 


uniformly integrable is stronger than being bounded in L? (also useful to think about on our own). 
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Theorem 64 


Suppose we have a collection of random variables X,, indexed by integer n which converge in probability to X.\. 


Then the following are equivalent: 


1. The {X,} are uniformly integrable, 


2. Xn, converges in L} to Xoo, 


3. E[|X,|] converges to E[|X.ol]. 


(We can see [3] for the proof, but we also proved this last semester in 18.675.) As a word of caution, (1) implies 


(2) implies (3) if we have a real-indexed process X;, but (3) does not imply (1). 


Definition 65 
A martingale X; is closed if there is some Z € L! such that X; = E[Z|F,] for all t. 


Theorem 66 


Let Xz be a right-continuous martingale. Then the following are equivalent: 


1. X Is closed, 


2. {Xz} is uniformly integrable, 


3. X; converges almost surely and in L+ as t > oo. 


Again, remember that the L? condition was sufficient, while these assumptions are both necessary and sufficient. 


Proof. (1) implies (2) because E[Z|G] is always a uniformly integrable family. To show that (2) implies (3), note that 


uniformly integrable implies sup; ||X¢||1 < oo, so Proposition 61 tells us that X; converges almost surely to X,,, and 


Theorem 64 tells us that X; converges in L! to Xj, as well. Finally, to show that (3) implies (1), note that we have 


Xt = E[X,|F;] for all t < u < oo. But X, converges to X,, in L+ by assumption, so we can pass the limit through 


the integral and find that X; = E[X.|F;]. Therefore taking Z = X,. shows that the martingale is closed. 


This result was the main step in proving the discrete-time optional stopping theorem, but the continuous-time case 
makes a few things more complicated. Remember that if X, is discrete and adapted to F, and T is a stopping time, 


then we define the stopping-time sigma-algebra 
Xp € Fp ={A€ Fa: AN{t<nmeF, Vn}. 


Then to check that X; is actually measurable with respect to F,, we just needed to check if {X, € B}N{t < n} € F,. 


But this is just a finite union of events 


{tT <n}n (Uriesr-n), 


k=1 


which is indeed in F,. But we immediately get measurability issues if we use continuous time because we'd have to 


take an infinite union, so claiming that X, € F, does actually require some regularity assumptions. 
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Proposition 67 


Suppose X; is adapted to F; and has right-continuous sample paths, and let 7 be a stopping time. Then X, € F,. 


Proof. We'll prove that X, is a composition of two maps. Fix t > 0, and consider the map F : Q x [0, t] + R sending 
F(w,s) = Xs(w). We claim that this map is measurable with respect to F; ® Bio.) (where B denotes the Borel 
sigma-algebra) — this property is called being progressive, and it’s stronger than being adapted. To show this, we'll 


approximate F with something that has this property: we split our interval [0, t] into blocks of length 7 and define 


FO (w, 5) = X(psnfeh/(n/t)(W) 


(meaning that we take the value of X at the right edge of the block). Each F‘”) is measurable, because the preimage 


of any set B can be written as 


(FO) *(B) = U {e%n “ee a ah 


which is just a collection of rectangles. As we take n — oo, F") converges to F by right-continuity (because 


U ({Xo € B} x {0}), 


([s-n/t])/(n/t) | s as n gets large), and the pointwise limit of measurable functions is measurable. So now we want 


to show that X; € F,, which means that we need to show that 
{X, © BEN {7t <t} © F, 
for all t. For any fixed t, define G(w) = (w,t A T(w)), and take F from above. Then 
(Q, Fr)  (Q x [0, t], Fr @ Bo.) > (R, Be) 
is a composition of two measurable maps, and notably 
{X, € B}N{7t<th={7r< t}n{w: F(G(w)) € B}, 


because t A T(w) = T on the event 7 < t. And now 7 < t is in F; by the definition of a stopping time, and similarly 


{w : F(G(w)) € B} is measurable because it’s the preimage of a measurable set. Thus we have the desired result. 


Let's look now at another result from discrete-time that we have to be more careful about in the continuous-time 
case. In discrete time, the stopped process X,,7 Can be written as the sum 


n 
Xnatr = Xo + > 1{k < T}(XK — Xx-1), 
k=1 
which is the same as the H-transform Xo + (H-X),, where H, = 1{t > k} is F,_1-measurable. Since each X, is 
integrable by definition of a martingale, Xp, will be in L? for all n (since it’s a sum over n+ 1 terms each in L): 
from there, we can check that Xp,7 iS a martingale using the H-transform property. But both of these facts are less 


obvious in the continuous-time case, and we have to do a bit more. 


Theorem 68 


Let X; be a uniformly integrable martingale with right-continuous sample paths, and let 0, T be stopping times 


such that o < 7. Then X,,X, are both in L}, and Xz = E[X,|Fo]. 
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Proof. We'll sketch the proof here: assume that we know the discrete-time result already. Approximate the 


stopping times by 
o-2" T+2" 
5 al 8 2 
Qn 2n 
Clearly On J 0, Ty | T, On < Tp, ANd On, Tp are stopping times for all n (because they’re larger than o, T respectively, so we 


“know to stop” at some time later). Then the discrete time optional stopping theorem tells us that X,, = E[X.,|Fo,|, 


so the X,, are uniformly integrable. Because X,, converge almost surely to X, by right-continuity, that means that 
Xo, also converges in L' to Xz. Similarly, X,, converges almost surely and in L! to X,. But (by the discrete-time 


result) we have 


Xon = Xt) |Fonl, 


and any event A € F, is contained in all F,, and therefore in their intersection. Thus E[X,,; A] = E[X,,; A] for all 


events A € F,, and taking n + oo yields (by L? convergence) E[X,; A] = E[X;; A]. This is exactly the definition of 


the conditional expectation X, = E[X,|F,]. 


With this, we can finally prove the result that we're after: 


Corollary 69 


Let X; be a martingale with right-continuous sample paths, and let 7 be a stopping time. Then (1) X¢a- is a 


martingale, and (2) if {X;} is uniformly integrable, then {X;,,} is also uniformly integrable with X,, = E[X-|F;]. 


Proof. We prove (2) first. We know that 7 and t AT are both stopping times, so Theorem 68 tells us that X,, X¢a7 
are in L? and that Xz; = E[X;|F;,7]. Now we wish to show that for all A in F;, 


a [Xtar; A] = E[X;; A]. 


On the event that 7 < t, the expressions inside the expectations are identical, so | E[lalr<tXtar] = E[Lalr<tX-] |. 
On the other hand, on the event that 7 > t, AN {tT > t} is in F;, and it is also in F, (by the definition of the stopping 


time sigma-algebra), so it is in their intersection, which is Fz,7. In other words, 


D[LalgrseyXtar] = E[LalgrseyXr] ’ 


by using Xtar = E[X-|Fi,7] and bringing 141,s; inside the inner expectation. Adding together the boxed expressions 
shows the result. 
Now to show (1) from (2), we know that Xa, is measurable with respect to Fin. C Fz, so our process is indeed 


adapted to the filtration. Also, Theorem 68, X¢az is in L1, so it is integrable. Thus, we just need to show that 


Xsar = ElXtar|Fs]. If we fix any finite t and define Y; = Xsa¢ (so we run the process only for a finite time), then 


Y; is uniformly integrable because it is closed — there is a single random variable X; with conditional expectations 


Y; = E[X:|Fsq7]. So applying (2), we see that Y; = Xzp;z is in L', and Y, = E[Y;|F.]. In terms of the original process, 


this means that Xsqr = E[Xta7|Fs], showing the martingale condition as desired. 


8 February 26, 2020 


Today, we're starting with Chapter 4 of our textbook, which discusses continuous semimartingales. In short, these 
are processes of the form Xz; = Mz + Az: where M; is a continuous local martingale and A; is a finite variation 


process. We haven't defined any of these terms or classified any of these processes yet, which will be the topic of the 
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next few classes. But before that, we'll take a step back and look at where things are headed (because the last few 


lectures have been a bit technical). 


Example 70 


Consider the standard Brownian motion B,, which is a continuous martingale. Suppose that we're interested in 


studying how X; = f(B;) evolves for some smooth function f. 


Heuristically, we can use It6’s formula, which tells us something of the form 
1 
dX, = f'(Br dBi + af"(Br)dt 


(where the main idea is that we've replaced the (dB;)* term with dt). In particular, this means X is not a martingale 
unless f is linear, which makes sense: a linear scaling w+ oB, should still be a martingale, but otherwise we get a drift 
term caused by the curvature of our function f”. (Intuitively, a positively curved function f makes f(Be+at) larger 


than the linear approximation f(B:) + f’(Bt)dBz.) So It6’s formula gives us a decomposition 


X= if F(B.)48,| + if t"(B.)as 


which turns out to exactly be the process M; + A; we'll be constructing. So this definition is really coming out of 
manipulations of Brownian motion. 

Throughout this discussion, it might be useful to keep the discrete-time picture in mind. Say that a process X, is 
adapted to F,, where Xo = 0 and E|X,| < co for all n. Then we can decompose 


n 


SOK — Xia — EX — Xa |Fi-a] 


i=1 


n 


SEX; — XalFal 


i=1 


+ 


Xn = 


upon which the first term corresponds to the martingale M/, and the second term to the finite variation process Aj. It 


may not be entirely clear how this A, relates to the A; above, though: to make that more apparent, imagine that our 


process X,, is a function f(S,), where S, =e >°7_,¢),  ¢j ~ Unif{1} is a random walk with step size e«. Then 


Antti — An = E[F (Sn + €Cn41) — F(Sn)|Fal, 


and now we can Taylor expand in € to write this as 


F'(Sn) {[¢n { 1|Fa] aay 5e2f"(S») Hen 1|Fn] mp et 


(taking out the constants and derivatives because they're measurable with respect to S,). But the ¢s are iid symmetric 
random signs, so this simplifies to 
1 
An+1 es: xe f"(Sn) = o(€?), 


which now looks identical to the A; term above. So to summarize, we want to understand this decomposition in the 
continuous case: chapters 4 and 5 will help us with a formal characterization of this class of processes. Along the way, 
we'll prove It6’s formula, which essentially tells us that the image of a continuous semimartingale under a smooth 
map is another continuous semimartingale: that is, h(M;+ A¢) = Mt + Ay for sufficiently nice h. Heuristically, the 
idea is that if X, = Mz + Az, then 


i 
dh(Mz + Az) = A(X) dX + 5h" (X)(dXe)” 


i 
= h'(X:)(dM; + dA) + 5h" (Xt) ((dMz)? + 2dMedA: + (dAt)*) . 
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A step of the martingale dM; is “like a step of the Brownian motion with some variance,” so dM; is a Gaussian 
step with variance of order Vdt, and dA; is something deterministic of order dt. So we can toss the last two terms, 


and we're just left with the three terms of highest order: 
1 
dh(Me + At) = h'(Xt)dM;z + (Wixoda + 5H(x\(dM)) ; 


where the h’(X;) term will be the martingale part and the rest is the finite variation part. We don't really know how 
to integrate any of these terms yet, so we have a lot to understand — we'll start with the A’(X;)dA; term. Our first 
task Is to understand what a finite variation process is and how we take an integral against it. 

Today, we'll start with the case where A; is some deterministic function of time and X; is some deterministic 


process. The reason for eliminating the w here is that t will “be our w.” 


Definition 71 


Let (Q, #) be a measurable space (not the probability space of our process). A finite signed measure on (Q, F) 


is a function a : F — R such that a is countably additive, meaning that if we have disjoint sets A; € F, then 


a (U1 4) = a a(Ai), 


f=1 


where the sum must be absolutely convergent. 


Example 72 
If 4, @_ are (actual nonnegative) measures on (Q, F), then a = a, — a_ is a signed measure. Also, if u is a 
measure on (Q, F) such that J, |h| du < oo, then v(E) = f-hdy is also a signed measure — in fact, h = - is 


the Radon-Nikodym derivative. 


It turns out that we can only get a signed measure by writing a = a; — a_ as the difference of two measures. 


Let’s see how to show this: 


Definition 73 
Let a be a signed measure on (Q, F). Then A € F is positive if a(B) > 0 for all B C A (similarly negative if 
a(B) <0), and Aisa null set if a(B) =0 for all BCA. 


Theorem 74 (Hahn decomposition theorem) 

For any signed measure a on (©, F), there is a bipartition Q = Q, LU Q_ such that Qx is positive and Q_ 
is negative. This decomposition is essentially unique — for any other decomposition Q = B, LI B_, the sets 
B,NQ_,B_NQ, must be null sets. 


We should refer to the textbook from 18.675, [3], for the proof — it’s about a paragraph long. 


Theorem 75 (Jordan decomposition theorem) 


Any signed measure can be uniquely written as a = a, — a_, where a}, @_ are measures on (Q, F). 


Proof. Let ai(E) =a(ENQ,) and a_(E) = a(EMQ_) — we can check that this is unique. 


We can now connect this to the idea of a finite variation process: 


36 


Definition 76 


A continuous function a: [0, 7] > R is of finite/bounded variation (FV or BV) if there exists a signed measure 


a on ([0, T], Bjo,7)) such that a(t) = a([0, t]) for allO<t<T. 


By the Jordan decomposition, this means we can write 
a(t) = a+((0, t]) + a_((0, t]) 


where the two terms on the right hand side are nondecreasing functions of t (because a1, @_ are actual measures). 
We will sometimes refer to these as a,(t) and a_(t). Then the measure ps = |a| = a4 + a_ Is the total variation 
measure of a, and we define v(t) = p(([0, t]) to be the “total variation of a on the interval [0, t].” So now if we want 


to define an integral against the function a(s), one natural way to do so is to define it in terms of our measure: 
T T 
: f(s)da(s) = | f(s)a(ds), 
0 0 


and similarly i. f(s)|da(s)| = iG f(s)u(ds). It’s important to emphasize that in both of these equations, the left- 
hand side is new notation, while the right-hand side is just a Lebesgue integral which we know how to compute. In 
order for these integrals to be well-defined, we just need to make sure that f is measurable and absolutely integrable 


(that is, i |f(s)|uW(ds) < oo). Here are two simple properties of this integral: 
+ By Jensen's inequality, we have Jo f(s)da(s)| = ee |F(s)||da(s)|. 


+ The function b(t) = i f(s)da(s) is also of finite variation — this will be important for something like Itd’s 
formula. Indeed, we can define b(t) = 6([0, t]), where G is the signed measure B(E) = J, f(s)a(ds) (this 
should look like the Radon-Nikodym derivative equation). Now decomposing into positive and negative parts for 


both f and a, the explicit formula for 6 is that 
BE) = fF, (s)ay (ds) + #(s)a_(ds)) ~ f (F-(s)a_(ds) + £(s)a4-(ds)). 
E E 


So to recap, we're considering the space (Q, F) = ([0, t],B), and we have a signed measure a corresponding to 
a decomposition Q = Q4 LI Q_. This yields a corresponding decomposition a = a1 —a@_ and zu=ai+a_. Note 
that a, is absolutely continuous with respect to w (because if a set has zero measure under yz, then it also has zero 
measure under a) and similarly, a. < fs. Furthermore, we can write down the Radon-Nikodym derivatives using the 


defining properties: we have 
da, _ da 1 
aie Qy a Q 
da, 
du 


functions hy and h_. Then the corresponding finite-variation process a: [0, t] ~ R can be written as 


(because we want a,(E) = f- du to be the integral over only Q4, and similar logic for a@_); call these two 


a(s) = a([0, s]) = [ h(s)u(ds), where h = = =h.—h_. 
Decomposing as a(s) = a4(s) — a_(s), we see that the total variation is v(s) = a;(s) + a_(s). Therefore, 
a(s) = 5(v(s) tals), a(s) = 5(v(s) —a(s)). 


Remember that our eventual goal is to understand the quantity i h'(X;)dA;. Once we introduce randomness back 


in the process, our processes A and X will usually be correlated, so we need to ask how we can calculate the integral 
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(for example, whether we can do it using time-steps like a Riemann integral). The first step is to look at a simpler 


quantity: 


Lemma 77 
Let a: [0, t] — R be a function of finite variation. Then 


i |da(s)| = sup{Vp(a) : P subdivision of [0, t]}, 
0 


where P is of the form {0 = to < th <--+ < tp = t} and Ve(a) = 9°, |a(t;) — a(t;-1)|. In addition, if we have 
an increasing set of subdivisions P, C P, C--- and the mesh of P,, goes to zero, then Vp, (a) > ie |da(s)|. 


Basically, we can approximate total variation by a discrete subdivision. This proof is similar to the Radon-Nikodym 


theorem, but we'll present a self-contained argument here: 


Proof. We know that Ve(a) < i. |da(s)| for all P by Jensen, so it suffices to show the convergence result (in other 
words, show that we do approach iy |da(s)|). By scaling, we can assume without loss of generality that u([0, t]) = 1 
so that ([0, t], B, w) is a probability space. Let G, be the sigma-algebra generated by the intervals of P, — for each fixed 
n, Gp is a finite collection, and the G, are nondecreasing because the P, are nondecreasing. Then the sigma-algebra 
o generated by the union of the G,s is just the Borel sigma-algebra on [0, t]. 

By the Hahn decomposition, we can write X(t) = A(t) = 1{t € Q,} —1{t © Q_}. X is a random variable on 
({0, t],B, uw), and we have the filtration G; C Go C --- C B, which gives us the closed martingale (implying uniformly 
integrable as well) X, = E[X|G,]. This means that X, converges to X almost surely and in L?, but because G, is 
finite, X, is piecewise constant (on the intervals of P,). Then (notation meaning here that the break points depend 


on n) we can write 


bX ae |] =X) a) 


(where technically it’s possible to modify X at any point, but the point is that it’s the measure of the interval times 
the value at any given point in that interval). But we also know that on this same interval (recalling that X and h are 


the same) we have 


“1X: [ti-1, t]] = / Perce Cm ere car 


a1 


But these two expectations should be equal, because X, = E[X|G,], so the value of X,, on [t;_1, t;] is equal to 


wet, Now because X, converges to X in L?, E[|X,|] converges to E[|X|]. X;, is piecewise constant, so we can 


write 


| Pal 


a(|Xall = >> w([t-1, ti) 


i=1 


| Pal 


= J) Ja(ti) — a(ti-1)| = Ve, (a). 


i=1 


a(tj) — a(ti-1) 
u([G=a, 7) 


Meanwhile, |X| is 1 almost surely (it’s either 1 or —1 almost everywhere), so 


[|X|] = 1 = u((0, t)) =| Ida(s)|. 


Thus as n —> co, the convergence E||X,|] + E[|X|] yields the result. 
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Corollary 78 
Let a be a function of finite variation and f : [0, t] > R be a continuous function. Then if P, C Po C--- 
subdivisions of [0, t] with mesh going to 0, then 


|Pn 


| t 
(ef) (ate) - a(t) > i F(s)da(s), 
0 


i=1 


where again the ¢t; implicitly depend on n. 


This is useful because we have more practice looking at things like the left hand side, so we'll be able to simplify 


calculations. 


Proof. We start by trying to define the Stieltjes integral: let th” be our break points corresponding to the subdivision 
P,, and define f")(t) = F(t) for all t € [¢&,, t&). The #™ are bounded uniformly, because || foo < || Flloo < 00, 


and f(") converges to f pointwise because f is continuous. But then the left hand side of the equation is ie f°") da, 


which converges to Ai fda by the dominated convergence theorem. 


Remark 79. We say that a function a: [0,co) > R has finite variation if a has finite variation on any compact interval 
[0, t]. 


Q9 March 2, 2020 


Solutions for the first two homework assignments are posted on Stellar now; the next homework is due on Monday, 
and we'll have office hours 3-5 on Thursday instead of today. 

Recall that a function a: [0, 7] > R is of finite/bounded variation if there exists a signed measure @ = ay — a_ 
on [0, T] such that a(t) = a([0, t]) = a4(t) — a_(t) (noting that a(0) = 0). Last time, we defined the integral 


[ f(s)ds = a f(s)a(ds), 


which is well-defined as long as we have the absolute integrability condition i |f(s)||da(s)| < oo. We noted that 
the integral iC f(s)da(s), as a function of t, is also of finite variation, and we noticed that if f is continuous and we 
have a sequence of subdivisions Py C P, C --- of [0, t] with mesh going to zero, then the discrete approximations 
ye F(t) (ate!) - a(e?))) converge to the integral fo f(s)da(s). 

Everything last time was deterministic, so we'll add randomness now. We'll be on a filtered probability space 
(Q, F, (Fz), P) for the rest of this lecture, and for the rest of this chapter we'll assume that we have continuous 


sample paths. 


Definition 80 


An adapted process A; (to our filtered probability space) is a finite variation process if all sample paths have 


finite variation. If all sample paths are nondecreasing in t, then we say that A; is an increasing process. 


Recall that a process H; is progressive if F(w,s) = H,;(w) is measurable with respect to the sigma-algebra 


F; ® Bio,t}. In general, this is stronger than being adapted, but any continuous adapted process will be progressive. 
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Proposition 81 


Let A be a finite-variation process, and let H be a progressive process. Suppose that i |H,(w)||dA,(w)| is finite 


for all finite t. Then (H- A); = lis H,dA, is a well-defined finite variation process. 


We'll skip over this proof for now, but the fact that this is an FV process follows from the result above that 
i f(s)da(s) is FV. What we need to check is that this process is measurable with respect to F; —we should read this 
on our own, and this is where we use the progressive condition. 

Remember that our original goal was to study the class of continuous semi-martingales of the form X; = Az + M:, 


where A is FV and ™ is a (local) martingale. We'll spend some time now on the latter object. 


Definition 82 
A continuous local martingale MM; on a filtered probability space (Q, F, (Fz), P) is an adapted continuous process 


for which there exists a sequence of stopping times (7,) such that 


* Tn(W) t co for all w, 


« The stopped processes (M:,7, — Mo) are uniformly integrable martingales for all n (we often say that T, 


reduces MM). 


Example 83 
Let B; be a Brownian motion in R3, and define M; = 737 = (B21 + B22 + Be3) 1/7. This is not a true martingale, 


but it is a local martingale. (We'll talk more about the difference in a future lecture.) 


Note that all continuous martingales are continuous local martingales, because we can Just take T, = n: we know 


that X; = E[X,|F<] for all s <n, so the variables (Mea, — Mo) are just conditional expectations of X, and thus must 


be uniformly integrable. This same conditional expectation argument actually implies that we can actually leave out 


“uniformly integrable” from the definition. 


Remark 84. The optional stopping theorem tells us that a stopped uniformly integrable martingale is still uniformly 


integrable, so any stopping time T of a continuous local martingale M gives us a continuous local martingale M¢en-z. 


Note that one set of stopping times to consider (which we can always use) is 
Th = inf{t : |Mz — Mo| = n}. 
The idea is that at any finite n, all of the stopped processes Mia, are bounded, so we have uniform integrability. 


Proposition 85 


Let M be a nonnegative continuous local martingale with Mp € L' (we need to add this condition because it's 


no longer true that MM; needs to be integrable in general). Then M is a true supermartingale but not necessarily 


a martingale. 


Proof. Let Nz = Mz — Mo, and let T, be the reducing sequence for M. Then for all s < t, we have 


Nsatn = E[NtatalFs] 
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because T, creates a uniformly integrable martingale, so the optional stopping theorem holds. Since Mp is integrable, 


we can add it back to both sides to find 


Msatn = [Metaral|Fs]- 


Taking limits on both sides, M,,,, converges to M, by continuity, while the right hand side satisfies 


lim E[Mear,|Fs] > [lim inf Menra| Fs = E[M,|Fs] 


n->oo 


by Fatou’s lemma and continuity, so we get the supermartingale inequality. 


Theorem 86 


If X is both a continuous local martingale and a finite variation process, then X = 0. 


This is a useful result to have, because this result gives us uniqueness of the decomposition X; = M; + A:. 


Proof. Since X is a finite variation process, we know that Xp = O and X; = i dX,. Then because X is also a 


continuous local martingale, we can define 
t 
Ty = inf{t : | |dX;| = n} 
0 


to be the first time the total variation of X exceeds n. Letting x{") = Xint,, We know that Lx”) <n for all t (because 
we stop the process before it can change by more than n), which means that X() is bounded and therefore uniformly 


integrable for each n. Fix n and let N = X”) for notation (note that N is a martingale). We have 


U[(Ns. _ Ns, )(Nep _ Ne )] =0 


for all 51 < 59 < ty; < ts (because when we condition on F;,, the martingale condition tells us this expectation is zero, 


and then taking another expectation gives us 0 overall). This means that we can break up the sum as 


doe — Nes)? 


i[N2] =E 


= SCE(M:, — Me)? 


i 


but the fact that i |dX,;| = n means that summing squares of increments will give us something small. Specifically, 


a[N?] <E (sine ~~ Nel) . ) |r, _ Ne, | 
i ; 
i 


The sum is at most n by definition, and the first term goes to 0 as the mesh of {t;} goes to zero. Because the whole 


expression inside the expectation is bounded, we know that this goes to 0 by the dominated convergence theorem. 


Thus E[N?] = 0, meaning that N; = x = Xtaz, IS identically zero for all n, which can only happen if X = 0. 


From here on, we assume that F; is complete. 


Theorem 87 


Let M be a continuous local martingale. Then there exists an increasing (finite variation) process A; = (M, M), 


called the quadratic variation, such that M? — A; is a continuous local martingale. A is unique up to null sets, 
and if P; C Py C--- is a subdivision of [0, t], then Ve (M) converges to A; as n — oo. 
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Note that A is the formalization of what we labeled as Jo (Ms)? in our earlier heuristic arguments. This Is a really 
important quantity: as motivation for why we care about this, if M is a continuous local martingale and A; = (M, M):, 
then M: is equal in distribution to a Brownian motion indexed by A;. So a continuous local martingale is just a time 


change of Brownian motion (and in the special case where (M, M); = t, M is just a standard Brownian motion). 


Start of proof. We'll prove this assuming that / is a bounded continuous martingale. (Generalizing to continuous 
local martingales takes very little work.) Again, we assume Moy = 0 for simplicity. 


For motivation, first consider a discrete-time martingale M,. We can write 


n 
Mz = 2 (MP — M2 — E[M? — Mz \Fi-al) + SS EIM? — M2 ,|Fi-al, 
i=1 i= 
where the first summation is a martingale. So a natural idea is to take a discrete approximation for our continous- 
time martingale, and hope that the remainder converges to our process A. It turns out that we don’t even need the 
expectations — we can take A(”) = > (Mi, — Mz+) More formally (in continuous time), M is a continuous bounded 


martingale, so we can take our increasing subdivisions of [0, 7] (with mesh going to zero) and define 


oe ; 
n 
A: =a (Mago - Meno ) : 


(n) 
tat} te 


Because M? = sel (we 2 ” subtracting the two and expanding yields 


|Pa| 
Ne = — AY? = 22, Ment (m tat) Mineo) ! 


which is a bounded martingale (by assumption, since M is a bounded martingale). 


Lemma 88 
Suppose that |M:| < C for all t < T, and suppose that My = 0. Then 


2 
: (Sm, me") eG 


i 


(Without the assumption that Mp = 0, the right-hand side is instead 48C%.) 


Proof of lemma. Expanding out the left hand side, we get diagonal terms for / = / and off-diagonal terms otherwise: 


2 
(sm = MF) = DO E[(Me = Mes)*1 +297 [(Me, — Me..)°(Me — My.1)?] . 


i i<j 


Simplify the first term by pulling out some factors (and bounding them by C), and compute the sum over j > / in the 


second term to get 


2: 
(sm - mM) < (2C)? SE[(Mz, — My.,)?] + 2) 2 [(M:, — Mz,_.)?(Mr — M,,)?] , 


i i 


where we've used the orthogonality of disjoint intervals (since Mr — Mz, = >7j5; Mt, — Mz_,). Now bound the blue 
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term in terms of C to show that 


2 
2] (oom, - Mz.) } < [(2C)? + 2(2C)?] $2 E[(M;, — My,_,)?] = 12C? - E[M3] 


F 


again using orthogonality (here's where we gain the factor of 4 when Mp = 0 isn’t assumed), and this last expression 
is at most 12C?- C2 = 12C%, as desired. 


Lemma 89 


If we still assume that M; is bounded, then the sequence (nS”) is Cauchy in L? (as n> oo). 


Proof of lemma. Suppose m < n, so that the subdivision P,, is contained in the subdivision P,. Let P, = (sj: 1< 
J < |Pml) and Ph = (t): 1<1<|P,|); because P, is a refinement of P,,, we can also index the tjs as tj,~, where the 
first index / tells us that we are in the range [s;_1, 5;]. Now the expected covariance can be calculated by writing out 
the definition of Ni: 


1 
zz [eo i) = S- Ms,_, (Mss, ~~ Ms.) Me, (Mt, _ Mz_,) 
ij 
If the time intervals are disjoint, the contribution to the expectation is zero, so we only get a contribution when the t 


increments are inside the s increments and thus this expectation reduces to a sum over subincrements 


» y [Ms,_. (Ms, ~~ Ms,_,)Me,,(Me,, — Mt.) : 
i,k 


Each term here involves the times sj-1 < ti,4-1 < tik < 5;, SO we can further break up the increment M,, — Ms, and 


the only term that remains is the “middle one” (by conditioning at an appropriate time). Thus we have 


LNs | = S- a [Ms,_.(Mz,, — My x Me, (Me, — M..-1)] : 
i,k 


1 
4 


So now if we want to show that our sequence no is Cauchy in L?, we want to calculate the L? distance. Expanding 


and then simplifying the terms to make the sum line up, we find that 


1 2 
; E (nm = nw) = S- a[Mz_, (Ms, — Ms,_,)*] — 2(cross term above) +S°E [M2 fig (Mt — Mey as 
i i,k 
=  [(Ms, = Meje-1)° (Mts ~ Mein 1)°| 
i,k 
57 1/2 


IA 
1G 
wn 
c 
TCT 
S 
Ss 


4 1/2 
| | a (1m = Meo? 
i,k 


where the last step is by Cauchy-Schwarz. The second term is bounded by our previous lemma, and the first term 


converges to 0 again by dominated convergence theorem, so we do have Cauchy convergence in L?. 


We'll finish up the proof next time, but this lemma is the main point — knowing that the process is Cauchy allows 


us to use the maximal inequality to get convergence. 
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10 March 4, 2020 


Today, we'll continue the proof that we started last time. The main idea Is that we have a continuous local martingale 


M, and want to decompose 
M? = (continuous local martingale) + (finite variation process). 


We know that this is unique, because any continuous local martingale that is of finite variation must be identically zero. 
Specifically, our goal is to show that (1) there exists a finite variation process A; = (M, M)+ (called the quadratic 


variation of M) such that (2) M?— A; is a local martingale, and (3) if P, is an increasing subdivision of [0, t], we have 


Pl 
YS (Myo = Maw) P, (M, My. 


i=1 
Continuation of proof of Theorem 87. Remember that we're proving this result assuming that M is bounded and that 
Mo = 0. For any subdivision of [0, 7] and any t € [0, T], we can decompose 


| Pr 


2 5 : 2 yg 
Me Ca M2, ! 


i=1 


which can then further be decomposed as 


IPa| 5 IPal 
a —_ = 
Mr; = » (Minx Ment) 3 ay Ment, (Miro Mineo) 
i=1 


i=1 


The first term, which we denote Al) iS Supposed to approach our finite variation process, and we showed last time 
that the second term, which we denote Nor) is a martingale. We showed last time that nw” is Cauchy in L? (as 
n —> oo) for any finite T, as long as M is bounded. So now by Doob's L? inequality, there is some constant C such 
that 

sup |[M?— Me™ |[, <c ||P — NPIL. 


O<t<T 
As m,n —> oo, the right-hand side goes to zero — because we have a martingale, having control of the endpoint gives 
us control of the entire process in the form of a kind of uniform convergence. In particular, we can find ng — oo (by 


extracting a subsequence) so that 
1 
Ok" 


| <=. 


This means that the quantity inside the expectation is finite almost surely, so the N(") converge uniformly on [0, T] 


sup |r _ N62) 
O<t<T 


which implies that (because E[|X|] = ||X||i < ||X|]2) 


z Ly sup mgr _ N{re+2) 


(ag US 


| < 
2 


outside of a null set Qo. We can thus define 


Ye(w) liMk—s00 NE") (ay) w ZQ, 
tlw) = 
otherwise. 


This process is adapted to Fz, and we just need to check that Y; is a martingale. But we've shown that n(n) converges 
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) 


to Y; almost surely and in L?, so because each Ne is a martingale, the statement E[Y;|F.] = Y; follows by taking a 


limit and passing the integral through (using L* convergence). 

So now we can construct our process A; = M? — Y; by taking the (uniform) limit (on [0, T]) of the As, because 
M? is fixed and nor) converges to Y; in the boxed equation above. While it is not necessarily true that A(”) is an 
increasing process, we do know that each A) is nondecreasing on the set {eh}. The set of t;s is dense in the limit, 
so continuity tells us that the limit A is indeed a nondecreasing process on [0, 7]. 

Finally, repeating this process for all integers T > 1 yields a collection of processes M? = A) + yi? fort <T. 
We just need to show that all of these are compatible of each other, which follows from the uniqueness claim we made 


earlier. In particular, the stopped process M?,, — AT) is a martingale, and if T’ > T, then M2, — Ar) is also a 


martingale. So if we subtract these, A” and Aw) must agree up to time T (because the difference is both a local 
martingale and a finite variation process), so this means A; is indeed well-defined. 
It just remains to check the final claim, but we have already proven that 
= = 4a (n) 
= n) __ 2 n 
S- (Mun — My ) = AM) = v2 — yor), 


. t 
i=1 


As n—> oo, this converges in L? to Me —Y7, which Is exactly A7y, so in particular the convergence in probability follows 


as well. 


All of the discussion above proves the theorem in the case where M is bounded and My = 0. Extending to the 


general case is easy, and we can read the details of that on our own. 


Example 90 


Let B be a standard Brownian motion. We've shown that Be — t is a martingale, so (B, B); = t. 


In a future lecture, we'll see the converse as well, which tells us that a continuous local martingale with (B, B)+ 


is a Brownian motion. In fact, if M; is both a continuous martingale and a Gaussian process, then M2 — ‘(M2 ] is a 


martingale, so we have (M, M); = E[M?]. So in both of these cases the quadratic variation is deterministic, but in 


general it can be random. 


Theorem 91 


Suppose that M is a continuous local martingale, and Mo € L?. Then the following are equivalent: 


- M is a true martingale bounded in ie 


+ E[(M, M) 0] is finite. 


If these properties hold, then Me — (M, M), is also a true, uniformly integrable martingale. 


Based on our work above, it’s natural to claim that M2 — (M, M)+ is a martingale and thus the expectations of the 


two terms are equal for all t. However, the main difficulty is that we only know that we have a local martingale. 


Lemma 92 


If M is a continuous local martingale such that |M,| < Z € L? for all t, then M is a uniformly integrable martingale. 


Proof of lemma. By definition, there are stopping times 7, such that Mia, IS a uniformly integrable martingale for all 


n. Thus, we have 


Msatn = O[Mt at |Fs] 
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for all s, t, and since M:,,z, is a continuous process, it converges to Mz; as n> oo. By assumption, |M;| is dominated 


by Z for all t, which means M:,-,, iS also dominated by Z for all n and thus the collection is uniformly integrable. Thus 


Mtar, converges in L? to M; as n > oo, meaning that by the dominated convergence theorem we have M, = E[M;|F; 


and thus we do have a martingale. Finally, again the condition that |M:| < Z guarantees that we have a uniformly 


integrable process. 


Proof of Theorem 91. We can assume My = 0 without loss of generality (because all of the quantities here only 
depend on increments of M). To show the forward direction, first note that S = supys9 |M;| is in L? by the Doob L? 
inequality. Define the stopping time 

On = inf{t : (M, M)+ =n}; 


then Y; = Me — (M, M), is a local martingale, so the stopped version Y;,,, is also a local martingale. But 
lYtron| = | Mino, = 4M, M) tro, <S? +n, 


which is in L+ because S € L?. Thus, by Lemma 92, the stopped process Yj, iS a uniformly integrable martingale 


(because it’s dominated by S? + n), and thus E[M2,,.] = E[(M, M)tao,]. By assumption, M is a true martingale 
and thus the left-hand side is uniformly bounded by E[S?], while the right-hand side increases to E[(M, M),] by the 


monotone convergence theorem. Thus taking nm — oo and then t > oo yields the result. 


On the other hand, assume the total quadratic variation is finite. Define the stopping times 
Th = inf{t :|Mz| = n}; 
similarly Y; = Me — (M, M), is a local martingale, so Y:,7, is also a local martingale. And now 


[Yertal = |Ménr, — (M, M)tarn| <n? + (M, M)oo 


is in L+ by assumption, so Yzqz, iS a uniformly integrable martingale again by Lemma 92. This means that 


[Merl = EM, M) tam] S EM, M) oo] < 00, 


so M is bounded in L? (by using Fatou's lemma as n — oo). It remains to show that we have a martingale — we 
already know that E[M¢az,|Fs] = Msaz,- For each t, the collection Mzaz, is bounded in L?, and we'll prove below in 
Proposition 93 that this implies uniform integrability. Therefore we can pass the limit through the integral as n > oo 
to find that E[M,|F,] = M,, as desired. 


It remains only to prove the following: 


Proposition 93 


If {X;} are bounded in L® for some p > 1, then {X;} is uniformly integrable. 


Proof. Let q = — so that ‘ + a = 1. By Holder's inequality, 


E |X|; |Xi] 2 M] < ||Xil|pl LIX 2 MBI lq- 


But the ||X;||p are uniformly bounded by some constant C, and the second term is just P(|X;| > M)1/4. Thus this 


simplifies by Markov’s inequality to 


Va ¢.¢e/4 
< —___., 
— — Me/a 


O[|Xil; [Xi] > MP <c () 
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This goes to 0 uniformly over | as M — oo, so we've verified the desired uniform integrability condition. 


We'll finish by wrapping up some minor points from chapter 4: 


Definition 94 
Let (Q,F, (F;),P) be a filtered probability space. If M,N are both continous local martingales, then their 


quadratic covariation is 


(M,N)e= 5 ((M LN, M-+N)_—(M, M)e— (N,N),). 


We say that M, N are orthogonal if their covariation is zero. 


For example, if we take two independent Brownian motions B,, Bo, then we can check that (Bi, Bo) = 0. 


Theorem 95 (Kunita-Watanabe) 
Let M, N be continuous local martingales, and let H, K be measurable processes, meaning that the map (t,w) > 
H,(w) (resp. K(w)) is measurable on F ® Bjo,.0). Then 


[PO WHekelldiM, nel < ([- H2d (WM, Ms) GE K2a(N, Ns) 


(In particular, it is sufficient for H and K to be adapted and continuous.) This is a kind of analogy to the Cauchy- 


Schwarz inequality. We won't go through the proof in full here, but the main idea is that the left-hand side can be 
approximated by 
|Pr 


i=1 


and by ordinary Cauchy-Schwarz this is bounded from above by 


1/2 
2 2 
Hay (Man — Myr J Ky (Nun — Ny ) 
(= Mea) ). (eae Ne 
But as n — oo, these two expressions converge to the left and right sides of the theorem statement, as desired. 


Definition 96 


A process X; IS a continuous semimartingale if it can be written as X; = M; + A;z, where M; is a continuous 


local martingale and A; is a finite variation process. If we have two such processes X; = M;+A; and Y; = M+ At, 


on the same space, then we define their covariation to be (X,Y); = (M, M’):. 


And this definition should make sense, because the sum 
n 
DKyo — Xa Yo — Yaa) 
ij=1 


will converge to (M, M'), as our subdivision P, grows finer (in other words, the finite variation part doesn’t contribute 


to the covariation or quadratic variation). 
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11 March 9, 2020 


We have an exam on Thursday evening in 2-449 (future note: this did not end up happening). This may be the last 
thing we do in person before we get quarantined, but the class is not too big, so this is probably okay. 

Today, we'll start discussing stochastic integration (from chapter 5). We're going to start seeing nice applications 
of the theory we've been developing, and we'll start with a bit of review. Recall that for a local martingale M:, 
subtracting off the quadratic variation (M, M); (an increasing, finite variation process) from M? yields a local martingale 
M2 — (M, M);. We showed that if My € L?, then M is bounded in L? if and only if E[(M, M)..] is finite; in this case, 


M? —(M, M); will in fact be a uniformly integrable (true) martingale because 


sup |M? — (M, M)e| < (sup| Ml?) + (M, Mc. 


where the first term is integral by Doob’s L? inequality and the second term is finite by assumption. (We wouldn't 
be asked to show Doob’s L? inequality on the exam, but we would be expected to be able to reproduce the above 


argument.) At the end of last lecture, we also defined the bracket 


(M,N) = 5 ((M LN, M+) —(M,M) —(N,N)), 


which in particular lets us calculate 


MeNe —(M, Nye = 5 (Me-+ Ne)? — (MN, M-+ Nye) — 5 (M2 —(M, Myc) — 5 (NE — (N,N). 


1 
2 
Each term on the right is a local martingale, so the left side is also a local martingale, and if M, N are bounded in L2, so 


is M+N. In such a situation, all three terms on the right side are uniformly integrable martingales, so M:Nz—(M, N); is 


a uniformly integrable martingale as well, and in particular this means that | E[Mj.Noo] = E[(M, N) oo] Jif Mo = No = 0. 


Our goal today is to take a semimartingale X; = A; + Mz and a class of processes H; and define the stochastic 


integral 


ib t 
(H-X)e= (HA) + (He Me = | Hadas + f HdMs. 
6) 0 


We've already seen how to compute the first integral — A corresponds to a signed measure, so this is just the Lebesgue 
integral of Hs against that signed measure. The notation suggests that this should be a continuous-time version of 


the Doob transform — recall that in the discrete case, we defined 


n 
(H-M), = ys Hi(M; — Mj-1). 
i=1 
This object was a martingale for appropriate H, and analogously the continuous-time version i H,dM, will turn out 
to be a local martingale. 
The reason we talk about everything in L? is that this is an important case: we'll define the stochastic integral for 


martingales bounded in L? today, and extending to local martingales will be done next time. 


Definition 97 


On a filtered probability space (Q, F, (F;),P), let H* denote the space of L?-bounded martingales. 


On this space H?, we can apply the boxed identity from above: since E[M,,N.] = E[(M, N)], we define the 


scalar product on H? via 


(M, N ue = aL (Moo Noo) ]- 
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In particular, if E[(M.., Mso)] = 0, then M must be identically zero, so this is indeed a norm. With this norm, H? is a 


Hilbert space — to prove this, we need to check that it’s complete, meaning that a Cauchy sequence (with respect to 
this norm) converges. We won't do this in much detail because it’s similar to what we've already seen, but the point 
is that if we have liMm,n—soo ||M" — M'"||z2 — 0, then we can use the Doob L? inequality to show uniform convergence 
of the M"s. 

Recall that a process H; is progressive if the function F(w, t) = H:(w) restricted to Q x [0, t] is measurable with 
respect to Ft © Bio. An equivalent characterization is to say that F is measurable with respect to P, where Ac P 
if and only if the process X;(w) = 1{(w, t) € A} is progressive. The point here is that there is a sigma-field P that is 


equivalent to being progressive, but it's not defined in a very useful way. 


Definition 98 
For an L?-bounded martingale M € H?, let L?(M) be the space of all progressive processes H such that 


E if H2d(M, M)e < 00. 


Remember that (M, M); is of finite variation, so the inner integral is just a Lebesgue integral. Note that L?(M) is 


equivalent to a standard L? space L?(M) = L? (Q x [0, 00), P, v) where the measure v is defined via 
V(A) =E ff 1{(w, t) € A}d(M, M), < 00. 
0 


This means that L?(M) is also a Hilbert space with all of the corresponding nice properties, so we won't need to prove 


those again. In particular, we have an inner product 


(H, K)20m) =E if H:K+d(M, M)e 


et 


From here, the idea is that for any M € H?, we will define the stochastic integral with respect to MV as an L? isometry 


J™ : L?(M) > H? which maps a martingale H via 


t 
Hes M(H) = He ( f HedMs 
0 t>0 


It's helpful to write out what the isometry is directly — if tt needs to preserve scalar products, we must have 


(H, K)i2m 


E if H-Kid(M, Ms =(H-M,K-M)pe =E[(H- M)oo(K - Moo) ] 


=o (fm) ). 


In the particular case H = K, this is called the It6 isometry. To define such an isometry, we can first define it on a 


dense subspace of L?(M) and extend by continuity to the entire space: 


Definition 99 
Let € be the space of elementary processes of the form H; = )7?_, Ay l{t € (ti, tiza]}, where each Hy € Ft, 


is almost surely bounded by some constant. 


These processes are pretty simple: we have some set of deterministic times, and on each time interval we put a 
(measurable) random variable. We have € C L?(M) because of the boundedness condition, and note that € makes no 


reference to M at all. 
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Proposition 100 


E is dense in L2(M) with respect to the scalar product on L?(M). 


Proof. |n an ordinary setting, we know that simple functions are dense in L?, but this argument is a bit more complicated 
because we need to account for the sigma-algebra. It’s enough to verify that if K € L?(M) with K L €, then K = 0. 
(Then the usual Hilbert space theory gives us denseness.) If K is orthogonal to €, this means that for all H € €, 


0=(H,K)~m) =E if HK d(M, M), . 


We will deduce that X; = je K,d(M, M), is identically zero. First, we claim that X is well-defined as a finite-variation 


process — for this, we just need absolute integrability. Specifically, by Cauchy-Schwarz we have 


| [ kslacom, M).| < JEL mein Me S[1-d(M, M):] < 00, 


where we know the first term in the square root is finite because of the inner product on L?(M), and similarly the 


second term is finite because M is in H*. The integral on the left-hand side upper bounds |X;|, so this implies that 
X;, € L? for all t. Now define 
H-(w) = Fis) 1{r € (s, t]}, 


where F(<) is some bounded random variable that is measurable with respect to F;. We've assumed that K | €, so 


0=(H,K).2(m) =E [Fes ff K,d(M, M),| = EF.) (Xt — Xs) 


by definition of X;. Since we already proved that X; € L?, this last calculation tells us that X is a martingale. But it 


is also a finite variation process, so we must have X = 0. This implies that K = 0 as an element of L?(M/) (meaning 


that it is zero except on a set of measure zero with respect to L?(M)), completing the proof. 


Theorem 101 
Let M € H? be an L*-bounded martingale. For any H € € of the form Hy = )7?_1 Ay 1{t € (t;, tizi]}, define 


= 


p 
ACs) = SS Ai) Meatins — Menz;)- 


i=1 


Then J™ defines an isometry from € (with the L?(M) scalar product) into H?, so it extends to an isometry from 
L?(M) into H?. 


(The idea here is that integrating 1dM, should give us back the original martingale, so integrating elementary 


processes gives us increments of the martingale.) 


Proof. We know that 7™(H) is in H? (because H;) are bounded and the stopped Ms are in H*), and the quadratic 


variation process can be computed as 


(7 (H), JM(H))e = > Hey ((M, My tats — (M, M)ere,) 


i=1 
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(this is only easy because H takes on a very simple form.) But this sum is also equal to ie H2d(M, M), by the 
definition of the Lebesgue integral for this simple-like function. So now we can check the isometry property on €: by 


the definition of the H? scalar product and our observation above, we have 


“4 


(7™"(H), 7" (H)) ee 


TH (0M, Mrs = (MM) 


= | f° H2d(M, M). = (H, H)12(m)- 


So J™ is an isometry on €, and we can extend this to L?(M) by continuity. 


Remark 102. We should have checked at some point that J“(H) doesn't depend on the representation of H, since 
H is written down in an explicit form and there might be different ways to do so. But the idea is that we have 
[|7(H — H’)||we = ||H — H’||L2<my, and we say that H and H! are the same if the right-hand side is zero. 


We'll spend the rest of the time on a useful result for It6’s formula, which says in words that “the stochastic integral 


commutes with the bracket.” 


Proposition 103 
Let M € HI? and let H € L?(M). Then H- M is the unique element of Hi? such that (H- M,N) = H- (M,N) for 


all N € H?. 


In particular, applying this result twice tells us that 
(H-M,K-N) =HK- (M,N) 


as long as everything is well-defined (meaning M, N € H?, H € L?(M), K € L?(N)); more explicitly, this is stating that 


t t t 
(f HedMs, | KN.) = | H.K,d(M, N)«. 
0 0 0 


Start of proof. Consider first an elementary process H € €. In this case, we know that 
(H»M)e = >> Hyiy(Meatios — Meaty), 
: 
where each term is a martingale. The covariation of this with NW is just 


(H-M, N)t = Ds Hi ((M, N) tation — (M, N)tat,) = | Hsd(M, N)s = (H- (M,N) )oo: 


so the identity holds in that case. For a general H, we can approximate H by elementary processes H” € €, and we 
wish to show that 
(H- M,N) = lim (H". M,N) = lim (H"-(M, N))o = H- (M,N). 
n-oo n-oo 


It just remains to justify the limits on the left and right. We proved the Kunita-Watanabe inequality last time, which 


says that 


1/2 1/2 


| Hekoam N). 


Es [| eam M). j[ Rzan, N)s 


Thus for any X € H?, we have by Cauchy-Schwarz and Kunita-Watanabe (setting M, N to both be M and H, K to be 
X, N) that 


a [(X, N)] < E[(X, N)(X, NY]? < E[(X, X)]/7 E[(N, NY]? = |X| lM ne, 
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meaning that taking the quadratic covariation with N is continuous with respect to the Hi? norm. Thus, the left limit 
is justified by taking X = (H — H")- M and noting that H”- M converges to H- M in the H? metric. 
For the right limit, apply Kunita-Watanabe and Cauchy-Schwarz again with H = X € L?(M) and K = 1 to find 


E[(X - (M,N) Joo] S |X| e2cuy|1 Mhz - 


The proof is therefore completed by a similar continuity argument by taking X = H—H”, which we've shown converges 
to zero in L?(M). 


In summary, all of the properties are easy to prove when H is an elementary processes, and then we just need to 


take appropriate limits. 


12 March 11, 2020 


This will be the last class we have in person — we'll continue after spring break on Zoom. If we don’t have efficient 
high-speed internet, we should let Professor Sun know by email. We started talking about stochastic integration 
last time: if Hi? denotes the (Hilbert) space of L?-bounded martingales, we defined a scalar product on H? via 
(M, N) ge = E[Moo Noo] = E[(M, N) oo]. We also defined the Hilbert spaces 


L?(M) = {progressive AH: \|Allieqmy = E ff H2d(M, M)s < co. 
0 


Then for any M € H?, we can define the stochastic integral, which is an isometry 7” from L?(M) > H? sending H 
to H- M. One property of H- M is that it is the unique element in HI? such that for all NV € H?, we have 


(H-M,N) =H- (M,N). (1) 


From now on, we'll use the notation (M) = (M,M). Our next goal is to extend stochastic integration to local 
martingales (and therefore to semimartingales). We'll start with a few remarks: if M is a local martingale and 7 is a 
stopping time, then the stopped process M7 = M¢jz is also a local martingale (this follows straightforwardly from the 
definitions). In particular, M2 — (M), is a local martingale, (M7)? — (M)¢q7 is also a local martingale. But there is a 


unique process that yields a local martingale when we subtract off from (M7)?, namely (M7). Thus 
(M7) = (M) tar => (M™) = (M)". 
Similarly, if M, N are both local martingales, then we also have 
(M7, NT) = (M, NT) = (M, NJ)". (2) 
This last fact is a bit harder to prove, but we can read it on our own. 


Lemma 104 
If M € Hi? and H € L?(M), then for all stopping times 7, we have 


(H-M)" = (H- Up.) M = H- (M7), 


Proof. Applying Eq. (2) and then Eq. (1), note that for any N € H?, we have 


((H- M)’, N)t = (H- M, N) tar = CH (M, N) tar: 
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But the right-hand side is just the integral of H against a finite-variation process, stopped at some time 7, So we can 
also write it as ((H - 110,71) - (M, N))z. Thus, (H- M)7 satisfies the characterizing property Eq. (1) for (H- 1)0,3)- M, 
so we've shown the first equality. 


For the second equality, we similarly have that 
(H- (M7), N) = H-(M™,N) = H- ((M,N)*). 


Again, this is the integral of H against a finite-variation process, so we can write it as ((H-1)0,7))-(M, N));. But again 


this means we've shown the characterizing property, so the result follows. 


To extend our stochastic integral definition, we'll need to be a bit more general: 


Definition 105 


For a local martingale M, define the space L2.(M) via 


loc 


t 
Py {progressive H: i H2d(M), < oo for all finite t as} : 


Recall that to be in L?(M), we integrate from 0 to oo and require the expectation to be finite, but here we only 


need the integral to be almost surely finite. The main goal of today is to prove the following result: 


Theorem 106 
Let M be a local martingale and let H € L2(M). Then there exists a unique local martingale H - M with initial 


loc 


value 0 such that for all local martingales NV’, we have 


(H- M,N) =H-(M,N). 


Once we prove this, it will make sense to write (H- M); = cs H.dMs) : 
t 

Proof. Let Mo = 0 without loss of generality (since nothing depends on the initial value). We'll start by talking about 

how to construct this process: we want to extend our definition from the L?-bounded case, so we define the stopping 


times 


<= inf {> 0: [ro +H2)d(M), > ny 


We can check that T, goes to oo almost surely: we assumed H? is in L2..(M) and (M) is a finite variation process, 


loc 
so neither term in the integrand gets large too fast. Furthermore, the total quadratic variation of the stopped process 
M™ will be (almost surely) at most n, so E[(M)..] is finite and therefore M™ € Hi?. Similarly, we have H € L?(M™) 


because by time Tp, the integral [ H2d(M). is bounded by n. 


This implies immediately that for each n, X" = H- (M7) is well-defined and is an element of HI?. We wish to show 


that for any m>n, X' and X” are consistently defined, but we have 
(x™)™ = (H . M™)7™ = H . ((M7™)7) = H 7 M™ 


by definition, then by Lemma 104, and then because Tp < Tm. This right-hand side is X”, so do indeed have consistency 
and in particular there is some X such that X™ = X" for all n. Furthermore, each stopped X” is an L?-bounded 
martingale (meaning it is uniformly integrable), so X is a local martingale, as desired. 

Next, we need to check that this X = H-™ satisfies the desired property (H- M, N) = H-(M, N). Assume without 
loss of generality that No = 0. We'll basically use the fact that the desired property holds at stopping times — define 
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On = inf{t > 0: |Nz| = n} and let yp = On A Tp. Notice that N° is a local martingale bounded by n, so it is a true 


martingale in H?. Then we have 


(H» M,N)% = ((H- M)™, N°) (by Eq. (2)) 
= (H-(M™), N%) (by Lemma 104) 
= H-(M™, N°) (by Eq. (1) 
=H ((M,N)Y™ 
= (H-(M,N))™, 


where the last step comes from the definition of integrating against a finite variation process. But now taking n + oo 
yields the desired property for X = H- M, since both 7, and op go to infinity as n > oo. 

Finally, we show that X is indeed unique: if there is some X such that (X,N) = H- (M,N) = (X,N), then 
(X — X,N) = 0 for all local martingales N. But take N = (X — X) to show that we have a local martingale X — X 
with quadratic variation 0, which can only happen if X — X = 0. 


Notice that Lemma 104 now also extends to the case where M is a local martingale and H is in L?,.(M), because 
we've now proved the analogous arguments for local martingales. 

We'll next discuss something useful for our next homework. Recall that if M € HI? (the L?-bounded case) and 
H € L?(M), we have two nice properties: first of all, E i Hgd/M6] = 0 (since H-M = is H,dM, is also a martingale). 


We can also calculate the second moment in the following way: we have 


t 
} / Hed(M).| =||H- Lpo.ellf2¢my = ICH - Lo.) - Mlle = [ICH M)"llfe, 


£ 


first by definition, then by the It6 isometry, and finally by Lemma 104. And since the H? norm is just the expectation of 


2 
the martingale’s eventual value, this is exactly E[((H- M),)*] =E (3 HsdMs) (the second moment we're after). 


It's important to note that these equations don’t necessarily hold if we’re in the general situation where / 


is a local martingale. However, they do hold under restricted conditions, for example if E [Jo H2d(M).| is finite. 


This is because X = (H- M)* has total quadratic variation (X).. =E [Jo H2a(M).| (by Proposition 103), meaning 


that if we assume the right-hand side Is finite, then X € H?. In general, if we're interested in the second moment of 


(fam) | < | HeaiM,| , (3) 


where if the right hand side is finite, then we have equality. (And otherwise this is a vacuous inequality anyway.) 


i H,dM,, we have an upper bound 


“4 


With this, we can now make the stochastic integral definition for semimartingales: if we have a process X = A+M, 
we want to define H-X = H-A+H-M, and we want to do this for some reasonably large class of H-processes on 


which both stochastic integrals will exist: 


Definition 107 


A progressive process H is locally bounded if sup,<+ |Hs| < oo for all finite tf. 


For any such process, we know that for any t, 


t a 
[ \esllaasl < (sup| 1) «PIA < oo 
0 s<t 0 
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loc 


so (H- A); is well-defined. Similar reasoning shows that So H2d(M), < 00, so H € L2.(M) for any local martingale 
M and thus H- M will be well-defined as well. This stochastic integral H- X is now also a semimartingale, and it's 
already written in its canonical decomposition (H.-A is the finite variation part, and H- M is the local martingale part). 

We'll use the remaining time to prove a useful convergence property, which is a dominated convergence type result 


for stochastic integrals. 


Proposition 108 
Let X = A+™M be a semimartingale, and suppose that H",H, K are locally bounded with K > 0. Fix some 


t > 0, and suppose that for all s < t, we have H2 + H, as n + o and |H2| < K; for all n. If ie K,|dA,| and 
i K2d(M). are both finite almost surely, then (H"- X); converges in probability to (H - X)¢. 


Proof. By the usual dominated convergence theorem (since the finite-variation integral is just a Lebesgue integral and 


everything's dominated by K), we already have 


£, t 
i HidA, = | H.dA; 
) 0) 


To show the other part, let T = tAinf {r >0: fj K2d(M); > k}. Eq. (3) implies that 


(foe Hoan) | 2 | [He - Hs)*d(M)s| , 


and we'll show that this right-hand side goes to zero by two applications of the dominated convergence theorem. First 


of all, inside the expectation, the Hs are both dominated by K, so the integrand is dominated by 4K?. By definition of 
the stopping time, the integral of this up to 7, will still be finite, so using the dominated convergence theorem when 
integrating against d(M), shows that the quantity inside the expectation goes to zero almost surely. Then we can 
use dominated convergence theorem again to show that the whole expectation converges to zero, since the integral is 
uniformly dominated by 4k? again by the definition of Tx. 


To prove the statement we're after, now notice that 


P(| [we Hoam, >e) <P( 


The second term goes to zero as k — oo (since {£ K2d(M), is finite almost surely by assumption), and then the first 


> :) +P(% 4 t). 


Tk 
| (H? — H.)dM, 
0 


term goes to infinity as n + oo because the integral inside converges to 0 in L*. Thus the convergence in probability 


iS proven. 


Corollary 109 
Let X be a semimartingale, and let H be an adapted, continuous process (in particular, this means it will be locally 
bounded and also progressive). Then for a sequence of subdivisions P, = (t!”) of [0, t] with mesh going to 0, we 


have 


i 


t 
» (Xyo — Xun) 4 i) HdXs. 
a 0 


If X were a finite-variation process, it wouldn't matter whether we take Am) or An) in the sum on the left-hand 
i-1 i 


side. But it matters here in the local martingale case, and we can check that directly. For example, if H = X, then 
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this corollary tells us that 
| Pal 


3% 40, (Xo —Xy) )& [ xx. 


but if we replace (/ — 1) with /, we will instead have 


| Pa | Pr |Pa| 


2 £ 
DXyo (X “om — X40) = 2 Xap, (X “om = Xun) +2 (* 0 — X,n) 5 f XsdX5 + (X). 


However, we can add these two statements together and find that 


| Pa | Pal 


d (x, (n) + x io) (Xu = xX io) = S- (xt - Xin ) = x =a XG 
i=l 
so X? — Xf = of X;dX5 + (X)+. This is actually a special case of It6’s formula, which tells us (in a special case) 


that for sufficiently smooth f, we have 


f (Xt) — f(Xo) =| f'(X5)dX5 + 5/ P(X )OX 5. 


We can understand this statement in full now: if f is twice continuously differentiable, then f’ is a continuous, adapted 
process, so the first term on the right hand side is a semimartingale (it’s an integral of an adapted process against a 
semimartingale), and the second term is a finite variation term. 

We'll end here for now — all of the cool applications of stochastic calculus will unfortunately have to be done via the 
internet. Because we have two canceled lectures, we'll be asked (if possible) to read the section about It6’s formula. 
As a reminder, the midterm tomorrow is canceled, and class resumes again after spring break. And from 3:30-5:00, 


Professor Sun will be in room 2-175 for general advising hours. 


13. March 30, 2020 


Our first midterm will now be on Thursday during the usual timeslot — if this doesn’t work for us, we should email 
Professor Sun. (The system will put us in a virtual waiting room, and we will be “admitted” into the internet office.) 
There will be no lecture on Wednesday due to the exam, so we'll have office hours during the lecture timeslot instead. 

First of all, for a quick review of stochastic integration, we should scroll down to the bottom of the course webpage 
and read the “Review of stochastic integration” section. The main points are that we can define stochastic integration 
as an isometry in the L?-bounded case by approximating with elementary processes, and then in general we can use 
the characterization (H- M, N) = H- (M,N) if M and WN are local martingales. We'll start today with It6’s formula, 


and we're going to state it in multiple dimensions here: 


Theorem 110 (td's formula) 
Let X;= (Og: see oe) be a process that evolves in IR® such that each x iS a continuous semimartingale. Then 


for a twice continuously differentiable function F : R? > R, 


F(X) ~ F(X) =>> f aF(X)aXi + ly Oj F(Xs)d(X', X). 
j=1 70 


ay 


is also a continuous semimartingale with the above decomposition, meaning the first term is a local martingale 


and the second term is a finite variation process. 
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Remember that the first term is integration with respect to a semimartingale, and the second is essentially a 
Lebesgue integral, so we do understand all of the individual components of this statement. We won't prove this during 


class — we should read the proof in Le Gall on our own. 


Remark 111. This formula holds even if F is defined only on an open set U C R®, as long as t < T. = inf{t : 
dist(X;, Ut) < e} for some € > 0. In other words, we need to stay at least € away from the boundary so that we can 


define a function F on all of R® consistent with F. 


Proposition 112 


If M is a local martingale, then €(AM), = exp Om, = (M).) is also a local martingale for all > € C. 


Proof. Applying It6’'s formula with Xt = AM: — a (M)¢ (so that E(AM)+ = exp(Xz)) and F(x) = e*, we have 
1 
dE; = exp(X;) dX; + 5 exp(Xt)d(X)t, 


because the derivatives of the exponential are just the exponential itself. Substituting in the values of X; and (X)+ 


(noting that the quadratic variation comes only from the local martingale part), we have 


? 1 
dé; = exp(X;) |AdM; — = aM): + 5 (M)t 


because X; gets its quadratic variation only from the local martingale part (AM;). But the last two terms cancel, so 


we only have a local martingale term and thus €(AM), is indeed a local martingale. 


Theorem 113 (Lévy’s characterization of Brownian motion) 


If X is a continuous adapted process on (Q, F, (F;), P) taking values in R¢@, then the following are equivalent: 


¢ X is a Brownian motion with respect to Fz, 


* The X's are continuous local martingales with (X', X/); = t-1{i = Jj}. 


Proof. The forward direction is easy, since Brownian motion is a continuous local martingale and a Brownian motion 
in R@ is just d independent Brownian motions. For the reverse direction, we'll want to use the Fourier transform, so 
it's natural to consider the exponential martingale €(i0 - X); for @ € R?. By assumption, the quadratic variation is 
(0-X), = |6|?t, so 

E(i0- X)_ = exp (i -X_- s?\at) 


Now exp(/@- Xz) is bounded because it’s only varying on the unit circle, and the remaining part exp (510|?t) is bounded 
on any finite interval. This means €(/0- X); is actually a uniformly integrable martingale up to any finite time, so 
we can apply the optional stopping theorem E[E(i0- X¢) | F;] = EU@- Xs) (for all O << s < t < co). Plugging the 


definition in from above and rearranging terms yields 


2 a 
5 [exp (18 - (Xt — Xs)) | Fs] = exp on ae, 


2 
But now the left hand side is the characteristic function of X; — X, given F,, while the right hand side is the 
characteristic function of N(0,(t —s)/qg). So conditioned on F,, X; — X,; has the correct normal distribution, which 


means that X has the same finite dimensional distributions as the Brownian motion in R?. Since we assumed that our 


sample paths are continuous, this indeed means X is a Brownian motion, as desired. 
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We've mentioned our next result before in previous classes: 


Theorem 114 (Dambis-Dubins-Schwarz) 


Let M be a continuous local martingale, and suppose that (M).. = oo almost surely for simplicity. Then there 


exists a Brownian motion B such that almost surely, we have MVM; = Bum), for all t > 0. 


If the quadratic variation is not infinite, then we just end up with a Brownian motion parameterized up to some 


time. 


Proof. We're going to construct a B first and then show that it satisfies Lévy’s characterization. Without loss of 
generality, we can assume that Mp = 0. Write A; = (M)+; since A is a nondecreasing process, we can define an 
“inverse” by defining 7, = inf{t > 0: A; > r}. We now define B, = M,, — we will check that this is a Brownian 
motion with respect to the filtration G, = F,,, and we just need to check the Lévy characterization. 

We know that M*™ accumulates a total variation of r by definition, meaning it’s a uniformly integrable martingale, 
which implies that (M7)? — (M)7 is also a uniformly integrable martingale. So by the optional stopping theorem, we 


have for all r > s that 


Bs = M,, =E [M,, | Fr] = {[B, | Gs]. 


SO, == 7. )< — 7, IS a uniformly integrable martingale, so we can again apply the optional stopping 
Al Be M,,)? M)-, forml bl | ly th | 


theorem to say that for all r > s, 


B=s= S [(Mz,)? — (M)x, | Fre] = EB? —r| Gs] . 


So B is a local martingale with quadratic variation (B), = s, and now we just need to make sure B is continuous to 
apply Lévy’s characterization. If A were strictly increasing, T would be a continuous function — then B, = M,, is a 
composition of two continuous functions, so it must be continuous. So the only problem is that A; may be constant 
on some interval [7;, T+], where 7-4 = inf{t : A; > r}. If this interval is nontrivial, then 7, < 7,1, and we need to 
check if B is continuous at r. But if A is flat on this interval, M@ must also be constant, because any local martingale 


with constant quadratic variation does not evolve with time (this is a lemma that we need to check, but we can read 


the book for details). Thus B is continuous and thus it is indeed a Brownian motion. 


Example 115 


Recall the example from our first lecture, where we considered a holomorphic function f : C > C. (meaning that 


if we write f = u-+iv, then f! = uy + ivy = vw — iuy by the Cauchy-Riemann equations). 


We can now check that f applied to a Brownian motion B; yields another Brownian motion: specifically, we have 


t 
f(B:) = Ba, A= [ |f’(Bs)|?ds, 
0 


where 6 is a Brownian motino. 


Remark 116. We're going to skip two topics in the book for now, which are the Burkholder—Davis—Gundy inequality 


and the stochastic integral representation for martingales. 


For the rest of today, we'll discuss Girsanov’s theorem, which will give us some practice working with all of 
the objects we've encountered so far. We did a lot of exercises involving change of measure, and we'll start with 


exponential change of measure here. 
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Example 117 


Suppose we have a random variable X with moment generating function m(0) = Ele 


do 


< oo. Define a new 


“tilted” probability measure with the Radon—Nikodym derivative 


dl 
dP = m(@)’ 


meaning that the probability of an event A is Pg(A) =E [14 cee]. 


The reason for the normalizing constant m(@) is that we want P,(Q) = 1. In particular, under this new measure, 


notice that 


dp | = 


[eX] (8) ' 
so shifting the distribution by Pg changes the mean as well. 


7 Ox i 
#9[X] = |x| [Xe’*] — m’(8) 


Example 118 


om : ; en, lle 
As an explicit example, let’s exponentially tilt the standard normal X ~ N(0,1) with ao 


exp(6?/2) 


Then the density of X under Pg is the product of the dentiy of X under P and the Radon-Nikodym derivative, 


which factors nicely as 


i, x? 6g? 1 1 
—— exp | —— ] - exp [| 0x = —— ex x—6)*). 
Tm 2 (2) 00 (%- 5) = Fagor (-20- 0") 
So X is now distributed as N(@,1) — in other words, this particular exponential tilt just moves the center of our 
distribution. 


Example 119 


x 0 ar) aa ome dQ 2 
Next, suppose ~N 5 is bivariate normal and —— = exp (ev — 5). 


We can repeat the calculation above, but another way to work through this is to use the characteristic functions 
itX] _ @ | pitx 


— it suffices to calculate Eg [e le ee @ 2] for all real numbers t. This can be done by replacing Y with 
aX + V1 — a2W, where W is a standard normal independent to X (consistent with the covariance between X and Y). 
Plugging this in, we find that 


; 1 6*t7(1— a)? @? t 
itX) _ po 24 = — i 
o [e*] =exp (Se + Qa)? 4 : 5) | 5 + iteal, 


where the first term comes from the characteristic function of X and the second term comes from the characteristic 
function of W. This means that X will now be distributed as N(@a, 1) under Q. 


In both of those cases, we had a finite number of random variables, and now we'll think about a more general case: 


Example 120 


d 
Consider a sequence of iid random variables X1,--- ,X, under P, and define the change of measure oe 


OX; 


Il eT by changing the measure for each X;.. Then the X; are tid under Q as well, and now the process 


Si ye X; is a random walk under both P and Q (just with different jump distributions). 


i=1 
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Notice that for any finite n, we can also write this change of measure as 


AQ 


op = °*P (6S, — nlogm(@)), 


but if we take n + oo, Q may not be absolutely continuous with respect to P. Motivated by our earlier examples, 


consider a sequence of independent random variables distributed under P as 


(Eh Ls aD) 


Then we can consider the two processes My = a ojX; and Ly = a TiY;, where the coefficients 0;,7; € Fi-1 


Xi 
Yi 


0 
0 


can also be random. (So at each stage, we add a Gaussian increment times some random number which is measurable 
with respect to the past.) Note that M and L are both martingales, and now we can define the “exponential martingale 
for L” 
ict 
D, = exp (. — 37) ‘ 
i=1 
We can check that Dx, is also a martingale — this should look very similar to the continuous exponential martingale 
we talked about earlier in class — and for any finite time, we can now define a tilt “ = D,. By applying our 
above argument, X; ~ N(aj7;,1), which means that M, under Q behaves like M, under P plus an extra drift term 
x, Oj ajT;- 
In other words, we find that M, — ojajT; IS a martingale under Q (while M, itself is a martingale under P). 
And this drift term is kind of a measure of the covariation of / with L: since M is the sum of o;X; and L is 
the sum of 7;Y;, it makes sense that there is a covariation of ajo;7T;. The point of Girsanov’s theorem is to give a 


continuous-time version of this: 


Theorem 121 (Girsanov's theorem, informal) 


Let M and L be local martingales under P, and define a change of measure via a Doo = E(L)go. Then 


M —(M, L) is a martingale under Q. 


The resemblance between this result and the discrete case should be clear. We should be a bit careful here — it's 
not always true that €(L). is a valid Radon-Nikodym derivative because of absolute continuity, and let's see how 


that can fail in the discrete case. Suppose we have probability measures 4, v on (Q, F) with F, ¢ F, and suppose that 


dvp 
din 


under 4, but it may not need to converge under v, so we define D,, = limsup D,. We can then decompose v into a 


Vn = V|z, are absolutely continuous with respect to Un = Lz, for all n. Then we know that D, = 


is a martingale 


continuous and singular part as 
(A) = | Dod + YAN {Dac = 20}), 
A 


which just tells us that we may not have absolute continuity as long as there is a positive chance that D, diverges in 
the limit. 


Example 122 
Let Q = []72,{0, 1}, and let F, = {A x J]72,,,,{0, 1}, A © {0, 1}”} be the set of events that only depend on the 


first n variables. Suppose that u = @/=, Ber(p) and v = @72, Ber(q) for some 0< p<q<1. 


It's clear that and v are not absolutely continuous with respect to each other: a sample X ~ 4p looks like 


(X1, X2,---) where the X; are iid Bernoulli with parameter p, and similarly X ~ v looks like iid Bernoullis with 
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parameter q. In particular, >on Xj; converges to p under one measure and q under the other by the law of large 
numbers. 

But in the discrete case, we know that py is just the law of (X1,---,X,) under uw. Even if uw and v are singular 
with respect to each other, we do have Un < Vp < Ln for all finite n — “seeing n bits doesn't tell us for sure whether 


it comes from an \id Bernoulli p or tid Bernoulli g.” Indeed, we can calculate explicitly that 


dp 7 GX 
D —_ —_ 
Neen 


where @ is chosen exactly so that the mean shifts from p to g, meaning it satisfies 


Bb [Xie] pe? 
[eeX%] —— pe® +1—p 


‘alXi] = = 


Doing the algebra and substituting back in, we find that D, concentrates around a specific point: we have 


(29°") nq n(1—q) 
p(i—q) q 1-q 
On= ey (5) Eg) -eotonta, 
1-q 


where H(q|p) is the binary relative entropy. In particular, H(q|p) = 0 if q = p and otherwise H(q|p) > 0, meaning 


that if pA q then D, > co and thus p, v are mutually singular. 
Basically, we should remember that Girsanov’s theorem doesn’t work for arbitrary L, so we need to understand 


which L actually produce a valid change of measure in the continuous case. We'll discuss this more next time! 


14 April 6, 2020 


We'll finish the discussion of Girsanov's theorem today — we'll start by recalling last week's calculation. Suppose we're 


on a probability space (, #,P), and we have independent Gaussians of the bivariate distribution 


(hs aD) 


such that F, is the sigma-algebra generated by these variables. If oj), 7; are bounded random variables that are Fj_1- 


Xi 
Yj 


measurable, we can define M, = = o;X; and Ly = a T)Y;, which are martingales under P. Considering the 


process up to some finite time n, we can calculate the Radon-Nikodym derivative 


a = 0,= [Tow (nm - 7%) = exp (4.- 3 ¥ 
dP rae 2 = = 
Then by the martingale property, nate = Dx, for all k <n, and Dx, is a discrete-time version of the exponential 
martingale €(L); = exp (Lt — $(L)+) (remember that in the continuous case, this gives us a strictly-positive continuous 
local martingale). This D then helps us define a change of measure: we found last time that X; ~ N(aj7;,1) is a 
shifted Gaussian under Q (conditioned on F;-1, so that we know the value of 7;), so this means that M, under Q 
looks like M, under P but with an extra drift term x 2, 0;T;a;, which we can think of as a discrete-time version of the 
“covariation of M with L.” 

At the end of last lecture, we stated the informal version of Girsanov's theorem: if M and L are local martingales 
under P, we can define a change of measure via a = Dy = E(L)oo, and M— (M,L) will be a local martingale 


under Q. We'll formalize this today: assume we are working on a filtered probability space (Q, F, (F;), P), where our 
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filtration Fy is right-continuous and complete. 


Proposition 123 


Suppose Q « P. Then D; = a F is a uniformly integrable martingale, so it has an rcll modification. 


We'll skip the proof of this — the fact that it's a martingale is easy to check from the definition of conditional 
expectation, and the rcll modification comes from results of Chapter 3. The main point is that we'll be working with 


such rcll modifications from now on. 


Lemma 124 


Suppose D; is a continuous local martingale with Do = 1 such that D; > 0 for all t. Then we can write Dz = E(L): 


for some continuous local martingale L. 


Proof. Apply It6's formula to ¥; = log D; to find 


1 1 
— De = 


dY, = —— 
De 2D2 


d(D;). 


If we take L+ such that dL; = dY; + =p d(D:) (to cancel out the finite variation term), then L is a local martingale 


with 
1 


= —d(D),. 
D? : 


1 
dL; = —dD; => dil): 
Dt 


which we can substitute back in to find dY; = dL; — Sd(L)t. Integrating this yields log Dt = Y; = Le — $(L)e, and 


finally exponentiating both sides tells us that D; = E(L);, as desired. 


In particular, this proof tells us the explicit formula L+ = , 5; dDs. 


Theorem 125 (Girsanov) 
Assume that Q < P, and D; = Ble, = €(L)z. Also assume that Fo is trivial, so Dp = 1. If M is a continuous 


local martingale under P, then M — (M,L) is a continuous local martingale under Q. 


This theorem essentially tells us that the class of martingales only changes by the drift term (M, L) — in particular, 


the quadratic variation of a continuous local martingale under P and under Q are the same. 


Proof. Let X be any adapted process. We first claim that if D-X (the product, not the stochastic integral) is a 
continuous martingale under P, then X is a continuous martingale under Q. To check this, first we make sure X Is 
in L+ — indeed, Eg[|X¢|] = Ep [De|X¢|] = Ep[|D:X¢|] (because D is positive), and this right-hand side is finite by 


assumption of D-X being a martingale. Now we check the martingale property: for any s < t and any event A € F,, 


20[Xt1a] = Ep[DtXt1,] = Ep[DsX514] = Eg|Xs1a], 


£ 


where we've used the definition of change of measure in the first and third equalities and the martingale property in 


the second. Thus Eg[X;|F;] = Xs as desired. And similarly, we can show that if D-X is a continuous local martingale 


under P, then X is a continuous local martingale under Q. 
We'll now apply this to X = M—(M,L): we want to show that this is a martingale under Q, so it suffices to show 
that D- xX is a martingale under P. Remember that D evolves via the formula dD; = E(L)¢dL+ = DidL; (see the 


explicit expression from Lemma 124), so using It6’s formula (with the function F(D, X) = D- X) we have 


d(D Xt = D.dX;t + X+dD;: + d(D,X)¢ 
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(only the mixed partial terms are nonzero for the function F, and the factors of 2 cancel). X+dD; is already a 
martingale, so we don't need to expand it out further. However, we can plug in the formula for X; and note that X 


is M plus a finite variation term, so this simplifies to 
X,dD; + Di(dM; — d(M, L);) + d(D, X)t = XrdDz + De(dM; — d(M, L)t) + d(D, M)¢. 


Now d(M,L); = B,a(M, D),, so the last two terms cancel, and we're just left with d(D- X), = X:dD; + DidM;. 


Since there is no finite variation term (and D; is also a martingale), this indeed shows that D- X is a local martingale, 


completing the proof. 


Note that this is not the typical way that we apply Girsanov’s theorem — often we start with a continuous local 
martingale L such that Lo = 0 and (L)., < oo almost surely. We know that this means L; converges almost surely to 
a limit Loo, so E(L); is a continuous local martingale. In particular, because it is nonnegative, it is a supermartingale, 
and thus it converges almost surely to E(L). with E[E(L).] < 1 by Fatou’s lemma. If we have equality, then 


E(L)t = D; is a uniformly integrable martingale (see our homework), and thus we can define a = E(L). and apply 


Girsanov with this L. So we need to make sure L satisfies the condition E[E(L)] = 1 to make sure all of this is valid. 


Fact 126 


Theorem 5.23 in Le Gall gives a few criteria for this condition being satisfied. Specifically, if L is a continuous local 


martingale with Lo = 0, then Novikov’s condition E [exp (4(L)..)] <0 implies that L is a uniformly integrable 


martingale with E [exp ($L..)] < oo (Kazamaki’s criterion), which implies that €(L) is a uniformly integrable 


martingale. 


We can read the proof on our own, but we'll instead focus on applications during class. Our first one will be to 


constructing a solution for a stochastic differential equation: 


Example 127 


Suppose we want to solve the differential equation dX; = b(t, X,)dt + dB;, where b is a measurable function 


with |b(t, x)| < g(t) for some g satisfying [5° g(t)?dt < oo. 


Solution. Let X be a Brownian motion under P, and let L; = i. b(s, Xs)dX,5. Since X is a Brownian motion under 
P, we have (L).. = ie b(t, X¢)2dt (since d(X)¢ = t), and this is finite because it is bounded by [5° g(t)?dt. Thus, 
Novikov's condition is satisfied, which means that we can define a new measure Q such that & = E(L)o. Applying 
Girsanov's theorem now tells us that B = X — (X,L) is a local martingale under Q. 

Because B is X minus a finite variation process, Lévy’s characterization tells us that B is a Brownian motion 
under Q because it has the correct quadratic variation. But X = (X,L) + B can be rewritten in differential form as 


dX; = b(t, Xz)dt + dB; by plugging in the definition of L, and this is exactly what we wanted. 


Notice that the only assumption we needed is that b(t, x) is measurable and bounded by an L? function g(t) — no 


other regularity condition was required! Our next application is the Cameron-Martin formula: 


Example 128 
Eetwe — fo 9(s)dBs for some deterministic function g(s), and again define 22 = €(L),,. Then B = B; — 


dP 
(Eh, (= /Sh Jo g(s)ds will be a Brownian motion under Q, as long as E[E(L)] = 1. 
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Thus is an explicit change of measure between a Brownian motion and a Brownian motion plus a deterministic 
function h(t) of the form h = — de g(s)ds. But note that not all functions A will work: for example, the law of By+ ct 


Brt+tct 
t 


is not absolutely continuous with respect to the law of B;, because Be goes to 0 almost surely while goes to c 


almost surely (so the measures are in fact mutually singular). To study this in more detail, notice that if we define 


Ai Aig i g(s)ds, 


then L;+ behaves as Ba,, where 6 is a Brownian motion. Since A; is a deterministic time change, this means that L¢ is 
distributed normally as N(0, Az), and thus if {>° 9(t)?dt = (L)o0 is finite, then L.. is just distributed as N(0, (L) 0), 


so indeed we will have E[E(L). = 1]. (So here, we don't need Novikov’s condition to see that this last condition 


holds, because we can calculate the law directly.) In other words, this means that the law of (B; + h(t)) is absolutely 
continuous with respect to the law of B; if and only if we can write A(t) = Jo 9(s)ds such that f>° g(t)?dt < 00; 


such functions h form the Cameron-Martin (CM) space. 


Example 129 


We'll spend the remainder of this class discussing an application to the large deviations principle — this often goes 


under the name of Schilder’s theorem. 


First, we recall Cramér’s theorem, which tells us about large deviations for the empirical mean ty X; of a 


random variable. Suppose that m(@) = E [eo*'] is finite for all @ € R, allowing us to define the cumulant generating 


function K(@) = log m(@). Cramér’s theorem then tells us that for any a > E[X], 


1 Si 
> 
“log ( = >a) > —I(a), 


where /(a) = supg(@a— K(@)). In other words, the probability is exponentially decaying with rate given by this function 


!. We proved the upper bound by using Markov’s inequality, and to show the lower bound, we used a change of 
dQ __ exp(9Sp) 

dP exp(nk(@) 
mean of X; is slightly larger than a, so the event { Sa > a} is now a typical event (since the sum of n iid terms with 


Then the idea is that the tilted 


measure. Specifically, choose @ so that E@[X;] = a+, and define 


mean slightly larger than a is likely to give something larger than a). So 


1~Q(2 >a) 20(as Bsa426). 


From there, we noticed that the Radon-Nikodym derivative is roughly constant on this event, so this is approximately 


dQ Sn Sn 
is ee <<a+t ~ = = 
P ES {2 =o 58 2c} exp (@na nn(0))P ( ; € [a,a+e| 
We're going to do something similar now, but with Brownian motion sample paths instead: 


Theorem 130 (Schilder) 

For simplicity, consider Brownian motion on the interval [0, 7]. Let C[0, T] be the space of continuous functions 
with the sup-norm ||- ||.o, and let W[0, 7] be the subspace of functions in C[0, 7] that start at 0. Then for any 
ACW(0,7], 


—N(A°) < lim inf € log P(VeB € A) < limsupelog P(VeEB € A) < —A(A), 
€ e}0 


where A° is the interior of A, A is the closure of A, and A(A) = infpea/(h), where I(h) = anes W(t\-de is 


analogous to the /(a) of Cramér’s theorem. 
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The key feature here is not the appearance of the liminf and limsup — instead, it’s the new function /(h) that is 


relevant. If we cared in Cramér’s theorem about a general Borel set A, we'd have the liminf and limsup there, too. 


Proof sketch for lower bound. Consider the set A = {f : ||f — Allo. < 6}. We want to make an analogous argument 


as in Cramér, turning A into a typical event. Thus, consider the tilt 


dQ 1 set 1 Oe 2 
Fn °?(% f (td — 5. h(t) at). 


Under this new measure, the Cameron-Martin formula tells us that being in A is basically like following a regular 


Brownian motion but with a drift term ae which is exactly what we want for B in the theorem statement. Thus 


1 = Q(A) = Ep [B1,], and now we can approximately evaluate the Radon-Nikodym derivative by taking its value at 


B= 7 to find that 


1 & exp (ef (Pat) P(A) = elogP(A) = -5 [ hi (t)*dt = —I(h), 


which is indeed the form of the desired inequality. 


If we're curious how we can extend this argument from a finite time interval to [0,0o), we can refer to the book 


of Deuschel and Stroock. 


15 April 8, 2020 


We haven't covered everything from chapter 5, but we'll hold off on stochastic differential equations for now and 
spend the next few lectures on continuous-time Markov processes, for which the theory goes beyond Brownian-type 
processes. (Most of this comes from Le Gall chapter 6, but we'll go a bit beyond that as well.) We'll start with a 


review of the discrete-time Markov chains: 


Definition 131 


A discrete-time Markov process or Markov chain on a finite state space E = {1,--- , k} is a discrete E-valued 


process (Xp)n>0 specified by a transition matrix P € R*** with entries pyy = P(Xn41 = yIXn = X). 


This matrix P can be thought of as a map from C* to C*, meaning that it is a linear operator, and in particular it 


also acts on functions f : E > C via 


(PF)(x) = So xy f(y) = Elf (Xn41)[Xn = x]. 


(This is the view we'll take in the continuous-time case as well.) Here, P is a stochastic matrix — its rows sum to 1, 
so the constant vector P1 = 1 is a right eigenvector with eigenvalue 1. And if we have n steps of the chain (that is, if 
we only observe the state every nth step), the transition matrix is just P” — this fact (across all n) goes by the name 


of the Chapman-Kolmogorov equations. Recall the following general result: 
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Theorem 132 (Perron-Frobenius) 
If a Markov chain is irreducible (meaning there is a sequence of steps from any state to any other state) and 
aperiodic (the gcd of all directed cycle lengths is 1), then 1 is a simple eigenvalue (of multiplicity 1), and all 


other eigenvalues have modulus strictly less than 1. Then the associated left eigenvector a* for the eigenvalue 1 


(satisfying 1*P = 1*) has all positive entries, and it is the stationary distribution of the chain. 


This can be thought of as a general theorem about matrices with real positive entries. The idea is that if we 
diagonalize P = UDU™!, then the column vectors of U are the right eigenvectors u;, and the rows of U~! are the left 


eigenvectors vj. But now we can write out 
k 
Pep = (a uy 
i=1 


And now since 1 Is a simple eigenvalue but all of the other eigenvalues have modulus less than 1, P”7* will approach 
1n* (the other contributions go away as n — oo). 
For our purposes, what will be interesting is first generalizing the state space and then turning this into a continuous 


process. We'll begin by looking at a general measurable state space (E, €): 


Definition 133 
A Markov transition kernel on a space (E,€) is a map Q: E x € => [0,1] such that 


+ for all x € E, the function Q(x,-) is a probability measure on (E, €), and 


+ for all A € €, the function Q(-, A) is measurable. 


Here, Q(x, -) represents the law of the process at some future time, given that we're in state x right now, analogous 


tothe row vector of x in the transition matrix P. The second condition about Q(-, A) is more technical and comes 


from generalizing the equation (Pf)(x) = E[f(Xn41)|Xn = x] from above. Specifically, if we have a measurable 
function f : E — C, then we can define Qf(x) = Jf, f(y)Q(x, dy) — this is the Lebesgue integral of f against the 
measure Q(x,-) — analogously to the discrete case. We just want that for any measurable bounded function f, the 
function Qf is also measurable and bounded, and this is where we use the measurability condition for Q(-, A). (QF 
being measurable follows directly for indicator functions, and then we approximate with indicators in general.) 

We'll let B(E) denote the set of bounded measurable functions on E with the sup-norm ||f|| = ||Flloo. Any Markov 
transition kernel Q maps B(E) to itself — in fact, Q is a contractive operator, meaning that ||QF|| < ||f||, because 
Qf is an expectation of the function f and thus uniformly bounded by its sup-norm. Note that so far, E has had no 
conditions other than being a measurable space, but moving forward we'll require further regularity conditions (and 


point them out as they’re needed). 


Definition 134 
A transition semigroup on a state space (E,€) is a collection of transition kernels (Qt)+>0 which satisfy the 


following conditions: 


* For all x € E, Qo(x,-) (the law of where we go in zero time) is the Dirac measure dy. 


sor alles, ¢ = 0, Qeup = @20. 


+ For all measurable A € €, the map (t, x) + Q;(x, A) is measurable with respect to Bjo,.0) @ E. 
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The second condition here is the Chapman-Kolmogorov condition, and it’s less trivial than before because now 
we're making sure that all of our Q+s are consistent. (And here Q,Q; is defined via composition of operators on 
B(E).) 


Definition 135 
A Markov process with transition semigroup Q; on a filtered probability space (Q, F, (F;),P) is an adapted 


process (X+z)t>0 such that E[f(Xs++)|Fs] = (Qef)(Xs) for all bounded measurable functions f € B(E) and all 
Se 0) 


Since this process has restrictions, we should make sure it does actually exist. If E is nice enough (for example, 
if we have a Polish space), then existence of such a Markov process with the transition semigroup (Q¢)t>0 Comes 
from the Kolmogorov extension theorem. (We have the uncountable index set is [0, co), and we just need to specify 
consistent finite-dimensional distributions, but those come from the Q;s.) We won't check the Kolmogorov extension 


theorem itself, but the point is just to make sure E is sufficiently nice. 


Remark 136. However, just like with the construction of Brownian motion and martingales, there is no guarantee of 
sample path regularity in this definition. We may talk a bit about this later on, but we'll focus on the aspects that are 


different from what we've already seen. 


Going back to the discrete-time case, notice that we can also define a semigroup (Qn)n>o, but the object is less 
useful because it’s just (/, P, p2 Ps. --), so in particular it’s specified by a single transition matrix P. So the first 
mystery is whether there is a “basic building block” analogous to P for Markov processes in continuous time which 
encodes information of the entire semigroup (Qt)+>0. The answer is “generally yes,” but the story is more complicated 


— this is only true for Feller processes, which we'll define later. To make that analogy, we'll need a few more concepts: 


Definition 137 
The A-resolvent of a semigroup (Q¢)+>0 (for some » > 0) is the operator Ry : B(E) + B(E) such that 


(Raf) = Ha ee Ort \di 


We should think of this as the Laplace transform (in the time-coordinate) of the semigroup. Remember that Q;f 
is the expectation of f at time t, given that we're currently at x — since the exponential distribution Exp(A) has density 


det, what this definition is really saying is that 


(Raf)(x) = SEIF(%,)IX0 = x] 


where 7) is a random time distributed according to Exp(A). (So when 2 is large, we emphasize times close to 0, and 


vice versa.) 


Lemma 138 (Resolvent equation) 


For any A, u > 0, we have Rx — Ru + (A— w)RaARy = 0. 


Proof. \t's enough to prove this for A 4 w (otherwise this is clearly 0). The composition of the two resolvents is 


(Ry(Ruf))(x) |= i. e 5Q.(Ryu(f(x)))ds = iy oO: (f° e*Qerat] (x)ds. 
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Expanding out the definition of Qs, this is 


fen ( e“Qufly)at] Qz(x, dy)ds, 


and now we can apply Fubini’s theorem (because there are no integrability issues when we have bounded functions) to 


- es [ eu | Q:f(y)Qs(x, dy)dtds. 
0 0 E 


Now the inner integral te Q:f(y)Qs(x, dy) means that we start at x and evolve for time s, ending up at y, and finally 


get 


evaluate Q:f. This means we are at state y and evolve at time t, and see what the value of f looks like there, so this 


inner integral all just evaluates to Qs1:f(x). Thus, this can all be rewritten as 
co co 
i; eets | e He HQ. f(x)dtds, 
0 ) 


where we've slipped in an e4“5—#* to separate the two integrals. Setting r = s+ t turns this into 


| eee e ¥'Q,f(x)drds, 
0 Ss 


which is a double integral over pairs (r,s). Changing the order of integration and then evaluating the inner integral 


yields 
—~Ar 


ee) r ie) e -r_e 
=} er art(x) [ eetdsdr = f Q,f (x) ————arr, 
0 0 0 A- bE 


R(u) — RO) 
Am ih 


which is exactly f(x) |; comparing this to the boxed expression above yields the result. 


This is mostly an algebraic manipulation — we won't use it today, but it will come up again in the next few lectures. 
The main idea is to become more familiar with the idea of composition of operators, and the key idea of the proof 
here was “composing” the operators using Chapman-Kolmogorov. 

For our next step, we're going to need some more regularity — assume that our space E is metrizable, locally 
compact (meaning that around any point, we can find a compact set that contains a neighborhood of the point), and 
g-compact (meaning E is a countable union of compact sets). In particular, this implies that E is a Polish space — 
examples of such spaces E include open subsets of R?%, as well as much more general spaces. Let E€ be the Borel 
o-field of E, and write E = UP, Kn (where the K, are compact and nested — this exists by assumption of being 


o-compact). 


Definition 139 


A function f : E > R tends to zero at infinity if lim sup |f(x)| =0. 
Ulsrde.2) xEE\Kn 


Remember that a Polish space is defined to be separable (containing a countable dense subset) and completely 
metrizable (topologically homeomorphic to a complete metric space). For example, if we take E = (0,1), this is locally 
compact and o-compact (it’s the countable union of the sets [+, 1 — +]), so it is a Polish space. Note that while E is 
not a complete metric space, because the point + doesn't converge to anything in E, it is instead completely metrizable, 
because (0,1) is topologically homeomorphic to R. So being completely metrizable is a topological property — we 
don't need to put a complete metric on the space. (In the language of the definition, “tending to zero at infinity” for 
the interval (0, 1) means that we tend to 0 at the endpoints 0 and 1.) 


We'll let Co(E) denote the set of continuous real functions on E tending to zero at infinity. This is a subspace of 


68 


B(E), which is a Banach space with the sup norm, so we'll also look at Co(E) with the sup norm. And we now have 


enough of the topological setup to define the class of processes that we want: 


Definition 140 


On a space E satisfying the above conditions, a Feller semigroup is a transition semigroup (Q+)¢>0 such that 


* Qt maps Co(E) into itself for all t > 0, and 


+ for all f € Co(E), ||Qef — f|| 7 O ast 0. 


A Feller process is a Markov process with a Feller semigroup. 


Remember that Q:f(x) is the expected value of f at time t, given that we're at x at time 0. So the Feller property 
tells us that the process doesn't make a large jump in a small amount of time, since the values of Q:f are close to the 
corresponding values of f. However, discontinuous jumps are still allowed, and many natural Feller processes do have 
jumps — it’s just that we're not likely to make a jump immediately at any particular time. (We should think of having 


a process that evolves continuously for some interval and then makes a jump occasionally. ) 


Definition 141 

For a Feller semigroup (Q+)z>0, let the domain of L, denoted D(L), be the set of f € Co(E) such that gist 
converges in Co(E) (in the sup-norm) as t | 0. Then we can define the (infinitesimal) generator L of (Q:)+>0 
to be the operator D(L) + Co(E) such that 


Lf =lim Qt St 
tO 


We'll spend the rest of the lecture on a heuristic preview of the material for next week. We should think of L as 


the “derivative” of Q; at time t = 0, but by the Chapman-Kolmogorov equations we know that 


3 Q;Qt a Qt 
= lim 
d s—0 S sO S 


= LQ:. 


We know that a real-valued function q : [0,co) — R solving the differential equation q’ = q with initial condition 


q(0) = 1 has unique solution q(t) = e“', and the Laplace transform of such a function is r(A) = fo ~ e*gq(t)dt = 44 
(tL)k 
ki 


that the resolvent is (A — L)~!. This isn’t a rigorous argument, but it’s our best guess, and our goal for next week 


for \ > £. By analogy, we might guess that Q; is similarly an exponential of the form Q; = et = ee>o , and 


will be to explore how valid this analogy is. 


Example 142 


Consider Brownian motion on R@ — we'll study it from the perspective of Feller processes. 


For each t, Q¢(x,-) is the distribution N(x, tlaxa) (this is how the process evolves in time t when started from x), 


so the generator of Brownian motion looks like 


U[f(x + Be)| — f(x 
Le = tim E+ Bel = 100 
to t 
We don't actually have all of the tools needed to evaluate this rigorously, but if f is nice enough and we let our 
Brownian motion go for a small amount of time, Bz is small and thus we should be able to Taylor expand. This means 
that 


nee ae 1 
i; fo lim 7E Vf (x): Bet 5 Bt (Hess f)By 
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and B; has mean 0 so the first term goes away in expectation. The Hessian of f is a d x d matrix, and In this case we 


only picking up the diagonal terms because the Brownian motion B; has independent entries in all d dimensions. Thus, 


1 
we'll get the trace of the Hessian, which is the Laplacian, | Lf = 5 AF (x) . This highlights the connection between 


Dirichlet theory and Brownian motion, which we'll explore more soon! 
We can also calculate Ry in this case, but the integral is more complicated: it turns out to be related to the Green 
kernel (the inverse of the Laplacian) when we take A | 0, since we're saying in that case that the resolvent is the 


inverse of L. 


16 April 13, 2020 


Remark 143. /n the survey responses that we filled out, it was mentioned that some of us are having trouble seeing 
the slides during class — as a reminder, there's a Dropbox link at the bottom of the course webpage, which has slides 


basically synchronized with lecture. 


We'll be talking more about Feller processes today, as well as a special case of these processes which is particularly 
simple. For review, say we have a Markov process Xz with transition semigroup (Q:)ts0 — recall that this means 
Qe(x,-) = P(Xs4t € -|Xs = x, Fs) for all s,t > 0 and x € E. As discussed last time, we can view Q; as an 
operator on B(E): given a bounded function f, we define the function Q¢f via (Qef)(x) = f F(y)Qe(x, dy) = 
ff (Xs4t)|Cs = x]. The Chapman-Kolmogorov equations tell us that Q,.; = Q;Q;, and we can define a Laplace 


transform R), = ie etQ,dt. (Since Xe integrates to 1, AR, is a Markov kernel, meaning that multiplying by 
X makes it properly normalized.) We previously showed the resolvent equation Ry, — Ry + (A— #)RxRy = 0, which in 
particular shows that R, and R,, commute. 

Last time, we defined Feller processes to be those such that ||Q;f — f|| > 0 as t | 0 for all f € Co(E). We also 


defined the space D(L) (the “domain” of L) to be the space of functions f € Co(E) such that limejo Sit exists in 


Co(E) (with respect to the sup-norm topology). Our goal today is to understand how L determines the semigroup Q:. 


Proposition 144 


Q; and L commute with each other on D(L) (the space on which they are both defined) for all t. 


Proof. We first write out 


Clr =o; (in ar) 
s|0 s 


but now the limit is in the sup-norm and Q; is an operator which is contractive (the norm of Qf is at most the norm 


of f). Therefore, we can also bring the Q; inside the limit to get 


i Q:Qsf —Q:rf —. Qs(Qrf) —(Qrf) _ 
im = lim = 
slO S sO S 


LOQzf, 


with the middle equality by Chapman-Kolmogorov, and this is exactly what we wanted to prove. 


The next result we'll prove is a differential relation: 


Proposition 145 


For all f € D(L) and for all t > 0, we have fj QsLfds = Q:f — f = fj LQsfds. 
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In other words, the time-derivative of Q¢f is given by Q:Lf. The two equalities were historically proven at different 


times — they're known as the Kolmogorov forward and backward equations, respectively. 


Proof. By the previous proposition, we just need to prove one of the two equalities. Fix a point x € E, define 
h(t) = Qef (x), and take the right-derivative of h — we see that 


: [Qs4tf (x) — Q:f(x)] = LQ; F(x). 


lim 
sO 
But since Q: is contractive, this convergence happens uniformly in x — more explicitly, we have 
1 al 
Q:( =(Qs— NF )]] < ]=(@s— DFT, 


and because the right-hand side converges uniformly, the left-hand side does too. So we in fact have h’(t) = LQ¢f (x) 


for all x, and integrating in t yields the result that we want. 


We can explain the names of “forward” and “backward” equation a bit more now — for a large class of (diffusive) 
processes, L turns out to be a differential operator. (For example, we showed last time that L = 5A for a Brownian 


motion.) 


¢ For the Kolmogorov forward equation, suppose we know the initial distribution X9 ~ & and we want to know 
how the forward evolution affects the law of the process at some final time. We know that X; ~ uQ:(dy) = 
Jece E(dx)Q:(x, dy). But because we know that £(uQz) = (uQ;)L, this gives us a partial differential equation: 
if wQ+ has a density p(t, x) in the x-coordinate at time t, then 


O,p(t, x) = Ly p(t, x), 


and we can solve this PDE “forward in time” by using the initial conditions p(0, x) that come from the initial 


density . 


Meanwhile, for the Kolmogorov backward equation, suppose we want to calculate the expected value E[f(X7)|Xo = 
x] = (Qtf)(x). This time we'll make use of the other equation 4Q;f(x) = LQef(x) — if we write f(t, x) = 


Q:f (x), the defining partial differential equation is now 


Orf (t,x) = Lyf (t,x), 


which we can solve “backward in time” from the final condition f(T, x) = f(x). 


We'll now return to the connection from last time between the resolvent and the generator: 


Proposition 146 
For a semigroup Q;, the range of Ry, denoted R = {Rf : f € Co(E)}, doesn't depend on X. Also, R is a dense 


subset of Co(E) in the sup-norm topology. 


Proof. The resolvent equation can be rewritten as Ry = Ry (/ + (A—p)R,), so the range of R, is contained in the 
range of R,. But the roles of w and A are interchangeable here, so that means the range of R, is the same for all X. 


To show that R, we can consider AR,f for any function f € Co(E), which can be explicitly written out as 


are = | re *Q; F dt. 
0 


#1 


This can be thought of as waiting an exponential (random variable) amount of time and evolving f by that length, and 
this simplifies after a change of variables to i e Qtr fdt. Now as A — ov, Qt/,f converges uniformly to f by the 


Feller property and e~© is integrable. Thus by the dominated convergence theorem, this integral converges uniform to 


ia e 'fdt =f, as desired (we can approximate any f by functions in R with vanishing sup-norm error). 


Theorem 147 
For any Feller semigroup, D(L) = R, and the two functions Ry : Co(E) — R and A—L: D(L) > Co(E) are 


inverses of each other. 


Proof. \t suffices to show that (1) (A — L)Ryg = g for all g € Co(E), and (2) Rx(A — L)f = f for all f € D(L) (this 


will also show that the domain and range line up). For (1), we know that 


je = =e. ke = owe 
Ee Se orng Raa) =tim = (2. (| e Q,aat) (/ eo“ Grgdt ||; 


We can use Fubini’s theorem to move the Q, inside the integral and then do a change of variables to simplify (after 


some rearranging, since replacing t with s + t shifts the bound) to 


1 foe} ioe} il co Ss 
lim = (o ee YO.00t— : e*Q,aat = lim — Ge = 1) f e “Odi ex | e*Q,gat] 
sO S (e) (e) sl0 S (0) 0 


But this simplifies to AR,g — g as s J 0, and rearranging proves the claim. Now for (2), take any f € D(L) and apply 


the Kolmogorov forward equation to rewrite 


ee) oo t 
ARF => | eMaitat = f re (r+ / Q.L fas) dt. 
0 0 0 


But now the first term integrates out to f, and we can swap the order of integration on the second term, so this is 


also equal to 
f+ f att | re “dtds = r+ f e SQ.Lfds =f + RyLf, 
0 s 0 


and again comparing to the original expression yields the result. 


Corollary 148 
A Feller semigroup Q¢ is uniquely determined by its generator L (though we do need to specify the domain D(L) 


for which the limit is well-defined). 


Proof. Let g € Co(E) be a nonnegative function (which in particular means Q:g is nonnegative as well). Knowing 
the generator L tells us Ryg = (A—L)~1g for all X. But since Ryg = i e-*t(Q:g)(x)dt, this means we know 


the Laplace transform of Q;g(x) for all A and thus know Q;g(x) itself. This characterizes Q; for any nonnegative 


function, which is enough to characterize it for all of Co(E). 


Our description here is less explicit than in the discrete case, in which we just said that Q, = P”. So it’s natural 


to ask if we have something like Q; = exp(tL), and that’s what we'll discuss next. But first, we'll do an example: 


Example 149 


Consider a Brownian motion in R®, for which we've already shown that L = sA. 


#2 


Since Q; tells us about the probability of going from x to y, we can write down the semigroup explicitly as a 


Gaussian density: 


1 =F 
Qr(x, dy) = (2nt)4/2 oo ( ot dy. 


The resolvent is then 


" Ix — y|? 


co 1 
_ —Art _ —At 
Ry(x, ay)= [ e“Q:(x, dy) = f Ont? oo ( sy ) dedy, 


and we know in general that this will be equal to (A — L)~1. The inverse of the Laplacian is the Green kernel, so it’s 


natural to plug in A = 0 in this example. The resolvent will not always be defined at \ = 0 (we can't always evaluate 
the Laplace transform at 0), but in this case the integral converges as long as d > 2 (since for large t the exponential 
term just approaches 1). Computing explicitly then indeed shows us that R(x, dy) = 2G(x,dy) at X = 0. The 
reason this argument doesn't work well when d = 1,2 Is that the Brownian motion returns to each state infinitely 
often; it is still possible to get the classical Green kernel back, but we need to do some renormalization. 

We'll now turn our attention to the question posed earlier about whether Q; = exp(tL) by thinking about the 


following situation: 


Definition 150 
Let Y, be a discrete-time Markov chain on a space (E, €) with a transition kernel P(x, dy). The canonical way to 


turn such a chain into a continuous time process is to let N; be a Poisson process of some rate c (which means in 


particular that N; is an integer distributed according to Pois(ct)). Then we construct a pseudo-Poisson process 
via Xt = Yne- 


Here, X and Y have the same trajectory up to a time change, and the only difference is that Y jumps at integer times 
while X jumps at random times T, (specifically, the increment T)41 — Tp are lid exponential random variables with rate 
c). Then the generator of such a process is related to the chance of making a jump in a time t, where t is small. But 
the probability that an exponential clock rings in time t is 1 — e~°' & ct, so 


‘ Ee — F(x) 


t 


b 


Lf(x)= ie 


ee | = [ (FO) = FW) Pd). 


In operator notation, this says that | L = c(P—/)|, so if we know the jump rate and transition kernel of a dsicrete 


Markov chain, we can find the corresponding L for the pseudo-Poisson process. The generator L is also a bounded 


operator, since ||Lf|| < 2c||f|| from the integral representation above, so we can indeed define 


0° k 
exp(tL)f = > ae a 
k=0 


We wish to show that et! is indeed the same as Q; for the continuous-time process, and we can show this by writing 


out 


ath — etc(P—!) — e-ctl actP _ et (ctP)* _ Dy (or) asi 


| k| 
k=0 k=0 
But now we can plug in the mass function for the Poisson distribution — this right-hand side is exactly ae P(N: = 
k)P*. And this is exactly how the process should evolve in time t: we first figure out how many times we update the 
chain, and then we evolve via P that many times. So we do indeed have Q; = et in this simple case. 
We'll finish by discussing the Yosida approximation theorem. Suppose we're back in the general case with a Feller 


semigroup (Q+)t>0 and a generator L. Since we have no control on the boundedness of L, we can't always define 
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exp(tL), but we can define a A-approximation (remembering that AR) “doesn't evolve very much” for large A) 


LY = XLRy =A(A— (A—L)) Ry =| A(AR, — 1) |, 


where we've used the fact that > — L is the inverse of Ry. (A similar argument shows that this operator can also 
be written as AR,L.) But here, the boxed expression looks a lot like L = c(P — /), so L) is the generator of a 
pseudo-Poisson process with transition kernel AR, and rate A. We'll let Qy) = exp(tL?) denote the semigroup with 


this generator LO), 


Theorem 151 (Yosida approximation theorem) 


For all f € D(L), ||Lf — Lf|| converges to 0 in the sup-norm as \ — oo. In addition, 


ete teres EL 


so the left-hand side converges on any bounded interval. In fact, for all f € Co(E) (a larger set than D(L)), 


QM F — Q¢f|| converges to 0 on any bounded time interval. 


So this A-approximation idea gives us one way to approximate a Markov process, which is to use simple processes 
of the form Qu and take A — oo. But at the end of the day, we're interested in the process, not the semigroup, 
so we're curious about whether the processes (OM sa converge in law to Q; as well. It turns out that this holds 


somewhat generally, and we might talk about this later on. 
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We'll cover three separate topics today, concluding our discussion of Markov processes. Our first is sample path 
regularity for Markov processes, which will provide some nice connections with martingales. Last time, we discussed 
that the resolvent and generator are related for a Feller process, and the other reason that this resolvent is important 


is that we can construct a supermartingale with it: 


Lemma 152 


Let Xz be a Markov process with semigroup (Q¢)t>0 and resolvent Ry. For any bounded nonnegative function 


h € B(E), the process S; = e-**Ryh(X;) is a supermartingale for all X > 0. 


Here, we're using the function fA to go from an abstract space E to the reals, so that we can do things like addition 


and subtraction. 


Proof. Clearly S; is a nonnegative process, and we can check that S; € L? for all t. Indeed, AR is a bounded (Markov 


transition) operator, meaning that ||AR,Al| < ||A||, so there are no integrability issues at any finite time t. To check 


| 


We can move the expectation inside the integral by Fubini’s theorem and then evaluate that expectation using the 


the supermartingale property, we write 


3 [SetlFe] = COCR [Ryh(Xeut) [Fe] = eOIE / eo Qrdr h(Xs4t) 
0 


definition of the semigroup, so this simplifies to 


en Me+e) | eME [Qrh(Xe4e)|Fe] dr = e+ | eo Q:Q,h(Xs)dr. 
0 0) 
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Applying Chapman-Kolmogorov and then performing a change of variables, we end up with 
em | etNQ..h(X.)dr = er f e~'Q-A(X;)dr. 
0 t 


But now A is a nonnegative function, meaning the integrand is always nonnegative, and thus this last expression is 


bounded from above by e~>* i e*'Q-h(X;)dr = e~*S Ry h(X5) = Ss, verifying the supermartingale inequality. 


Earlier in the class (chapter 3), we proved that when our filtration is right-continuous and complete and X; is a 
supermartingale such that t — E[X;] is right-continuous, X has an rcll modification X which is also a supermartingale. 


We'll take advantage of this to prove that Markov processes have a similar property: 


Theorem 153 
Suppose F; is right-continuous and complete on a filtered probability space (Q,F,(F;),P) and X¢ is a Feller 


process with semigroup Q+¢. Then X has a modification X which is also a Markov process with semigroup Q; and 


rcll sample paths. 


Proving this directly is not so straightforward — the crucial fact that we used to prove the original result about 
martingales was Doob’s upcrossing inequality (which gave us control over right and left limits). Markov processes 
don't necessarily have upcrossings in generic spaces F, but this result shows that working with nice enough (Feller) 


processes still gives us enough control. 


Proof sketch. We'll assume first that E is compact, so that Co(E) = C(E) (in other words, the semigroup is defined 
on all continuous functions). As a (topological) exercise, there exists a countable subset {f,} C C(E) which separate 


the points of E, meaning that for all x 4 y, there is some n such that f,(x) 4 f,(y). Consider the countable set 
H = {Rof, : p,n € N}. 


We showed last time that AR, converges to the identity as X — oo, and thus H also separates the points of E (for 
any x, y that are distinct, we can find an f, such that f,(x) 4 f,(y), and then we can pick sufficiently large p so that 
Rofn(x) # Rofn(y)). Now for any h = Rof, € H, define the process 


St=e Pax.) =e Ree). 


This is a supermartingale by Lemma 152, and the Feller property tells us that the expected value t > a(S? is right- 
continuous. Thus, our arguments from chapter 3 show that there is a modification of SP which ts rcll, and now we 
can simultaneously define the countably many modifications Se for all h € H. To finish, we take a countable dense 
subset D C [0, 00) and take the limits 


lim X;(w), lim  Xs5(w). 
s{t,s€D stt,seD 


(Remember that the Sti are real-valued, while the X, are E-valued — the claim we're making is that these limits above 
exist in E.) Indeed, we would violate the rcll property for some supermartingale Be if these limits didn't exist: if there 
were two sequences s, | t and 5% | t (taking sequence values in D), where X5,(w) — x and Xz,(w) — y, then letting 
h separate x and y. we would find that Sf has a limit along both of these sequences, which is a contradiction. Thus, 
all of the limits do exist and X has the desired rcll modification. 

In general, if E is not a compact space, we can use a one-point compactification. Applying the above argument 
to E U {A} gives an rcll modification X on the larger state space, and we just need to show that X does not visit this 


extra point A. This basically follows by using the Markov semigroup to verify that e-th(X:) has hitting times at 4 


going to infinity as n + oo. 
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Theorem 154 


Under the same setting as the theorem above, the process X satisfies the strong Markov property. 


We won't say anything more about this here; the main argument is that it’s a Markov process, so it satisfies the 
simple Markov property, and then we can approximate random times by a discrete set of possibilities, where being a 
Feller process helps with continuity. 

We'll now move on to our next topic, Lévy processes. These processes don’t have to be continuous, so they 
aren't covered as much in our textbook, but there’s a large field of research about them — we can see [4] for more 
information. In short, this is a specific class of Feller processes (including Brownian motion and Poisson processes) 


which are “spatially homogeneous.” 


Definition 155 


A Lévy process is a real-valued process Xz with stationary and independent increments (meaning that for all 


s<t, X:—Xg is independent of F; and X;— Xs is equidistributed as Xz_,), such that Xz converges in probability 
to Xp =O as t— 0. 


To understand how the spatial homogeneity point affects our situation here, we may write 
Qi(x, dy) = P(Xs4t © dy|Xs = x) — P(Xs4¢ —Xs€ d(y = Xs)|Xs = x) = Qr(d(y = x)), 


since the conditioning does not affect our probability. In other words, this is a Markov process where deciding how we'll 
move at the next step is independent of our current position. There are a lot of known facts about these processes, 
and we'll talk today about the characterizing features of a Levy process. The characteristic function for Xz can be 


written as 
E[e%*'] =E [een tXeXip)] 


and the increments X;/2 and X;— X¢+/2 are lid, so we can factor this. In fact, we can break this up into arbitrarily small 


chunks, and the idea is that we end up with a characteristic function of the form E [e’**] = e'¥(). Such a process 


does not have a lot of degrees of freedom: for example, X; is infinitely divisible, meaning 


Xt = Xt/k (Xot/k Xt/k) rent (X; Xt—t/k): 


where all k terms on the right side are lid increments. The class of infinitely divisible processes is indeed somewhat 


restricted: 


Theorem 156 


A Lévy process is characterized by three numbers (a, (ope v) (a drift term, a rate of diffusion, and a jump measure), 


where v is a possibly infinite signed measure on R \ {0} such that f min(1, x?)v(dx) < co. 


Proof sketch. First of all, Lévy processes are Feller processes (we can see our textbook for this), so we can assume 
that we're working with an rcll modification Xz. Such a process X; has countably many discontinuities, which are the 


points where AX; = X; — Xt_ = 0. Take the empirical measure of the jumps 


n= S- O(t,AX¢) 
t 


(we can think of this as drawing points in the t, X¢ plane corresponding to the jumps), so that 7 is a random measure 
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on [0,00) x (R \ {0}). Because our process has stationary and independent increments, 7 must be independent on 


distinct time blocks, meaning that 7 must be a Poisson random measure with intensity E[n] = (Leb) @ v. So v is the 


distribution of the jumps, and that accounts for the discontinuity. 

From here, we want to remove the jumps and end up with a continuous process, and the idea is that what we end 
up with is basically a Brownian motion. Ideally, we'd subtract off dis<t AXs — there's only countably many jumps — 
but we don’t know if this sum is convergent. Instead, we subtract off all jumps that are large: the rcll property tells 
us that there are only finitely many large jumps, so we can define J; = 7,<, AXs1{|AXs| > 1}. Now let € € (0, 1], 


and define 


Mf = 5 (AX; — EAX;)1{|AXs] € (e, 1}. 


s<t 


This is also well-defined for any positive €, and on our homework we'll show that Mf has a well-defined limit M; as 
€ — 0. (We'll need the second-moment property on v, and then we'll need the martingale L? inequality.) So now the 
process Y = X — M — J is a continuous Lévy process which will turn out to be of the form at + aB;. The issue is 


that we've assumed nothing about integrability in the definition of a Lévy process, and we'll see how to fill in the 


details on the homework. 


Again, the significant point here is that Lévy processes only do two things: the continuous part evolves as a 


Brownian motion, and the discontinuous part has jumps evolving at a Poissonian rate. 


Theorem 157 


Let € be a real-valued random variable. Then the following are equivalent: 


- There exists a Lévy process such that X1 is equidistributed as €, 


- The law of € is infinitely divisible, 


: . : “3 d 
+ There exists a triangular array €;; such that each row is an lid sequence of length m, and Dees, Cie = 


This is an important result, because one idea from last semester (Lindeberg-Feller) is that such a triangular 
array with mild conditions forces € to be a normal random variable. But we also know that there are non-Gaussian 
infinitely divisible random variables (such as the Cauchy, gamma, and Poisson distribution) in which we violate the 
Lindeberg-Feller conditions. In particular, in such situations the random variables in our array are “heavy-tailed,” and 
this theorem is interesting because it covers all iid triangular arrays with row sums converging to a limit (in contrast 
with Lindeberg-Feller). 

We'll close with a brief note about our final topic, approximation of Markov chains. Recall that last time, we had 
a Feller semigroup Q; with generator L, and we mentioned that LO) = X(AR,—/) generates a pseudo-Poisson process 
with an associated semigroup Qh) = exp(tL). The Yosida approximation theorem then stated that Qh) > Q; as 
A — oo, and now if x) is such a realization — that is, a process with semigroup Qh) — we want to know whether 
X) converges in distribution to X. 

One example this can help us understand is whether a simple random walk converges in distribution to Brownian 
motion. And another example comes from the fact that the average of n Cauchy random variables is Cauchy (we can 
show this by looking at the characteristic function): we may ask whether the process xi) = Stnx/n (where S sums 
up some number of Cauchy random variables) converges as a distribution to a continuous-time analog (namely the 
Lévy process with X, distributed according to the Cauchy distribution), given that xo converges to t - (Cauchy) for 
any fixed t. 

The first issue we need to worry about is the topology for convergence in distribution — if we have an rcll random 


variable that can also have Jumps, the sup-norm topology is not a good choice anymore. For example, the process 


a 


X(t) which is 0 for some time 1+ i and then takes on the value 1 after that should converge to the process which 
jumps up to 1, but this isn’t true in the sup-norm topology. Instead, we use the Skorohod topology, which allows us 
to slightly reparameterize the time. And from here, the idea is that semigroup convergence is equivalent to weak 
convergence in the Skorohod topology for Feller processes. It turns out that semigroup convergence is easier to 


show in these kinds of situations (we can check each Q; on its own) than showing convergence in law directly. 
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We'll quickly finish discussing Markov processes today (chapter 6) and move on to some preparation for potential 
theory (chapter 7). Recall that we started by discussing Markov chains in discrete time in a finite state space, where 
the chain is completely specified by a transition matrix P. Under mild conditions (being irreducible and aperiodic), the 
Perron-Frobenius theorem guarantees convergence in law to a unique stationary distribution m* (that is, 7*P = 7*). 
In continuous time, the dynamics of the system are now specified by a semigroup Q:, and in a particularly nice class 
of processes known as Feller processes, we have a generator L that determines the semigroup, and we know that the 
A-resolvent satisfies R, = (A— L)7?. 

A further subclass of these Feller processes is the space of psuedo-Poisson processes, where the generator L just 
looks like c(P — /): in such a case, Q: = exp(tL), and the Yosida approximation tells us that a Feller process can 
be approximated by these psuedo-Poisson processes. Finally, a different subclass of the Feller processes is the Lévy 
processes, which include standard Brownian motion B;, the standard Poisson process N;, and the first passage time 
process T, (that is, the infimum t such that B; > a). This last process (T,)a>0 is indeed a Lévy process because it 


has independent increments, and we can completely characterize its behavior: 


Fact 158 


1{x>0} 


We have 7; = Sf0.a1x [0,00] xn(dsdx), where 7 is a Poisson random measure with intensity Leb @ Janna? 


This is a special case of the processes on our homework — it’s good to see this worked out if we haven't seen 
anything like it before, and the details are on the online notes. 

With this, we'll move on to potential theory — we'll first cover the subject in a discrete setting because there's 
enough going on already in that case. The central idea of what we'll be doing is relating Markov chains to Dirichlet 


problems. 


Definition 159 


A Markov chain Y, evolving on a discrete state space V is reversible if there exists a symmetric function c : 


V x V — [0,00) such that p(x, y) = 2, where c(x) = Bcc): 


c(x) 


We'll only discuss reversible Markov chains here — any such chain can be described by a weighted graph G = (V, E, c), 
where the vertices are the elements of the state space and the (undirected) edges are of the form (xy), where 
c(x, y) > 0. (For any edge e, c(e) Is called its conductance.) We can also define the weighted adjacency matrix 
A = {c(x, y)} — assuming non self-loops, the matrix has zeros on the diagonal. If we let D be the diagonal matrix 
such that D(x, x) = c(x) (this is like the degree of x in the unweighted case), then the transition matrix of the chain 


is P= D~!A. For a function u: V > R, consider the function 


Lu(x) = Ex [u(%) — u(%o)]. 
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This is a discrete-time version of the generator, measuring the expected change in one step of the chain given that 
we started chain at x. But this is just (Pu)(x) — u(x) = (P — /)u(x), and we call L = P—/ the weighted graph 
Laplacian. In particular, we can also write this out as L = P—/ =D-'A—/=D-!(A—D). 

Remark 160. To understand the name “graph Laplacian,” consider the case where we have a random walk on G = Z‘. 
Then x is a point in Z, and there are 2d possibilities for where we can go, and thus 


d 


Lu(x) = So (ulx + e;) — u(x)) — (u(x) — u(x — &)). 
2d 


i=1 


This is a difference of first derivatives, so this behaves like a second derivative in the ith coordinate (summed over /), 
and thus L acts basically like 35 95; 0?. 


From here on, we'll assume that G is a finite connected graph, and we'll be interested in potential functions (also 
called harmonic functions), which are essentially functions u such that Lu = 0. First note that if we actually require 


Lu =0 to hold everywhere, then u must be constant. Indeed, take any x € argmax(u); since u is harmonic, 


Lu(x) =0 => u(x) = Ex{u(%)]. 


But u(x) is the largest possible value of u, and the right hand side is a weighted average of values of u, so this means 
u(y) = u(x) for all y adjacent to x. Continuing throughout the connected graph, this means that u is constant, as 
claimed — this is known as the maximum principle. 

So to make things more interesting, we'll take some subset B of the vertices V, which we call the boundary B, 


and we won't require the function u to be harmonic on B. Then we get a more general maximum principle: 


Lemma 161 
Let G be a finite connected graph with boundary B and interior U = V \ B, and let u: V > R bea function such 


that (Lu)|y = 0 and ulg =0 (harmonic on the interior, zero on the boundary). Then u = 0. 


Proof. Again consider x € argmax,ey(u(x)). Since u(x) is still a weighted average of us around it, we again have 


u(y) = u(x) for all y ~ x. Continuing in this way, we eventually reach the boundary (where the value is fixed to be 


zero), and this tells us that everything must be zero. 


Definition 162 


A Dirichlet boundary value problem consists of finding a harmonic function u : V — R such that (Lu)|y = 0 


and u|g = f for some boundary condition f. If such a u exists, the solution is called the harmonic interpolation 
OW if 


Note that if we have two solutions u’, uv” with the same boundary data f, then u’—u” is still a harmonic function but 
equal to zero on the boundary. Thus Lemma 161 tells us that u/ — uv” = 0 — in other words, the harmonic interpolation 


must be unique. To show existence, we can write down a solution directly by defining 


u(x) = Ex [F(¥7)], 


where T Is the first hitting time of the boundary. This has the correct boundary conditions, because tT = O for any 
x € B, meaning that u(x) = f(x), and we can check for ourselves that Lu(x) = 0 for all x € U. (This means there is 
always a unique harmonic interpolation from the boundary to the rest of the graph, as long as both the boundary and 


the interior are nonempty. And the boundary does not need to correspond in any visual sense to an actual boundary.) 
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Example 163 
The reason for the name “potential function” is that we can view G = (V, £, c) as a wiring diagram of an electrical 


network. Think of the vertices V as nodes and edges E as wire connections with an associated number c(e), 
i 


representing the electrical conductance of e. (In particular, r(e) = AG) is the electrical resistance we might 


have seen in a physics class. ) 


There are two main facts to know about electrical networks: 


* (Ohm's law) If we hold two vertices x and y at fixed voltages v(x) and v(y) (for instance, the two ends of a 


battery), then that imposed voltage difference creates an electrical current 


vox) = Vy) _ 


oe) (x) 2 (Hox) — viy)) = ex) p(x. (VOX) — VL). 


c(x) 


(This is how we define the current /, and we'll take the convention that everything “flows downhill” from positive 


(x,y) = 


to negative voltage.) Now if we have a voltage function v : V > R which tells us what the voltage is at each 


vertex, then the net current into the node x from its neighbors y is (div /)(x) = >>, i(y, x). Substituting the 


YU 


previous expression in and using that c(x)p(x, y) = c(y)p(y, x), we thus have 


(div /)(x) = e(x) $3 o(x Y)(V(y) — v(x) = e(xJLV(9). 


Yur 


+ (Kirchhoff’s node law) If x is not connected to an external electrical source or sink, then the current into x is 


the same as the current out of x. (If electrons flow in, they also need to flow out.) In other words, this means 


that | Lv(x) = 0] (the voltage function should be harmonic) everywhere other than the sources and sinks. 


This means that the Dirichlet boundary value problem with boundary condition f : B + R will take on the value 
of the voltage function v : V > R if we impose voltages f on B. This is a nice view, because it’s easy to calculate 
properties of electrical networks: for example, having two edges in parallel with conductance cy and Co is equivalent 
to a single edge with conductance cy + co, and having two edges in series with resistance rm and fo yields a single edge 
with resistance ry + fr. So there are various rules for reducing a network, and sometimes this reduction allows us to 
take a complicated network and reduce to a single wire with an effective conductance (if we, for example, impose a 


voltage of 1 on one vertex and a voltage of 0 on another). We'll make this more formal: 


Definition 164 
Suppose our boundary B C V is partitioned as ALI Z, where A is the set of sources and Z is the set of sinks. If 
we have the boundary condition f = 1, (think of this as connecting the positive end of a battery to A and the 


negative end to Z), then define the total current and effective resistance via 


1 il 
I(A>Z)  Cor(A- Z) 


(A> Z) =—S (div f(x) = So(div (x), Rer(A > Z) = 


XEA xEZ 


We can check that the effective resistance is the same if we switch A and Z, so we often represent the effective 
resistance with a double arrow Re(A <~ Z). This is interesting from a probabilistic point of view, because we can 
reframe quantities in terms of these conductances and resistances. For simplicity, take A = {a} to be a single point, 


and define the escape probability from ato Z 


P(a— Z) =P, (inf{n >1:Y, € Z} < inf{n>1:Y, =a}). 
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(This is the probability that we escape to Z before returning to a.) To calculate this, condition on the first step of 


the chain and write 


P(a> Z)= 5 °P(% = y)Py (inf{n > 0: Y, € Z} <inf{n > 0: Y, = a}). 


yra 


Letting 7 be the first hitting time of aU Z, we can rewrite the last probability here as Ey[1z(Y,)] = Ey[1 — 1{Y,; = 


a}| = 1-v(y), where v is the voltage function corresponding to v(a) = 1 and v(Z) = 0. Thus, we have 
P(a> Z)=S° play) — v(y)) = S- pla y)(v(a) — v(y)) = -Lv(a), 
y~ra yva 
but as derived earlier, this means that 


(divi)(a)_ M(a>Z)  Cer(ae Z) 
c(a) c(a) c(a) 


So we have a nice equivalence between a probabilistic quantity and an electrical one! 


P(a7> Z)= 


In the time remaining today, we'll introduce the discrete Green kernel: 


Definition 165 
Suppose we divide our vertices again into a boundary B C V and interior U = V \ B — everything we define here 
depends on our choice of U, but we'll omit the subscript. Let 7 = inf{n > 0: Y, € B} be the first time our chain 


leaves U. The Green kernel is defined by 


Gy(y) = E, Bs Ve | : 


n=0 


This quantity G,(y) is the expected number of times our chain hits y when started from x, and it can also be 


rewritten as - se 
G(vy= > E.MY, =yhit > al = > pay), 
n=0 n=0 


where pp is the probability that we hit y after n steps and haven't left U yet. We know that c(x)pnp(x, y) = c(y)paly, x) 
a = ay is symmetric in x and y; in particular, if G is the matrix with 


G(x, y) = G,(y), then (D~!G)(x, y) = gx(y) is a symmetric matrix. The key identity we should keep in mind is that 


by reversibility, so the quantity gy(x) = 


£ 


for any x € U, we can condition on the first step of the chain, so 


G(x) 1 1 Dx 1Gx(Y1 )]c(x) 1 
9x(X) c(x) c(x) ( x y, (x)]) c(x) c(Y%) c(x) x [9x 1)] 
Thus, the left and right sides of this equation tell us that Lg,(x) = =e However, we can check that for any 
other point y € U \ x, we have (Lg,)(y) = 0 (the same calculation goes through, but we don’t get the +1 from the 
starting point, so there is no ao term). This means that Lg, = mec which can be rewritten as the matrix identity 
LG =-—I!|on U. So we can define G in the probabilistic way (with the expected number of visits), but it turns out to 


also be equal to the matrix inverse of the Laplacian. 


Remark 166. We should be a bit careful: remember that the definition of L does not depend on the choice of U, but 


in the identity above, we're restricting L to only contain the rows and columns corresponding to the vertices U. 
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19 April 27, 2020 


To give us a bit more time to think about the homework, the deadline is pushed to Thursday. To finish the class, there 
will be a test next Thursday, May 7, and a final problem set on Tuesday, May 12. 

Last time, we started discussing potential theory for discrete space and time: we let G = (V,E, c) be a weighted 
graph, where the weights determine the transition of the Markov chain. Defining the weighted adjacency matrix 
A(i,j) = c(i) and letting D = diag(c(x) = >, c(x, y) be the matrix which tracks the outdegree from each vertex, 
we find that P = D~+A is the transition matrix, and L = P—/ = D~!(A—D). This was helpful for solving the 
Dirichlet problem, in which we have a vertex set V = BLIU and want to find a function u: V + R such that Lu =0 


on U and u=f on B. Such a function exists and is unique, and the answer is given by u(x) = E,[f(Y-)], where T is 
the first hitting time of the boundary B. 
We also introduced the Green kernel Gyu(x, y) = Ex pe 1{Y, = v3], which tracks the total number of visits 


to y — we found that Gy = —(L,)~+ can be written in terms of the Laplacian matrix (but only taking the rows and 


columns from U). Furthermore, we can actually use this to rewrite the solution u. Writing L in block form, the 


io oo] el El 


Lyuy + Lugug =0 = > uw =—(Lu)*Luef = GuLuef. 


Dirichlet problem asks that 


so we must have 


Here, Ky = GuLus is also known as the discrete Poisson kernel, and we can write it out as Ky(x, Z) = ey Gu(x, y)Lus(y, Z). 


But the only nonzero terms here come from x, y € U,z € B, so this can also be written as 


Ku(x,Z) = )o(Gu(x, y) — Gu(x, z)) p(y, 2): 
yeu 
Here Gy (x, Z) is just zero, but the difference will become a derivative in the continuous analog. Indeed, the continuous 
setting is what we'll talk about today. We won't talk about things in full generality — we started with a general weighted 
graph in the discrete case, and it’s possible to similarly use a general Feller process with infinitesimal generator L, which 
takes the place of the discrete Laplacian L. But in our case, we'll just discuss Brownian motion in R7, so we just have 
L= 5A. 


Definition 167 


Let U C R®@ be an open subset. A function u € Li,.(U) (locally integrable, so in particular bounded on compact 


sets is strong enough) satisfies the mean value property on U if for all x € U and r > 0 such that B,(X) C U, 


il al 
u(x) = 1B.) ex u(y)dy = is, ae u(y)dy, 


where |B,(x)|, |S-(x)| denote the volume of the ball B-(x) and sphere 0B,(x), respectively. 


We'll make use of the following analysis fact: 


Fact 168 


A function f € Li_(U) satisfies the mean value property on U if and only if f is harmonic on U, meaning that 


loc 


>) eon) 
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(In particular, this implies that f is twice continuously differentiable and in fact smooth.) This will help us in our 
study of the continuous Dirichlet problem: we'll assume for simplicity that U is a bounded domain and our boundary 
condition is a continuous function f : OU + R. Our goal is then to find a continuous harmonic function u: U + R 
such that Au = 0 on U and u=ff on OU. It turns out that the solution looks similar to the discrete case as long as 


we have some regularity condition: 


Theorem 169 


If U satisfies the “exterior cone condition,” then the solution to the Dirichlet problem is u(x) = E,[f(B,)], where 


T is the hitting time of the boundary (that is, 7 = inf{t : B, ¢ U}). 


We won't actually deal with domains that don’t satisfy the exterior cone condition in this class, so we won't worry 


too much about that detail. 


Proof sketch. Because U is a bounded domain, OU is compact. Since f is continuous and defined on a compact 
domain, it is bounded, and thus the function u we define above is bounded (in particular, it’s definitely in Li). To 
show that u satisfies the boundary condition, we need the exterior cone condition (details here omitted). For the 
mean value property, if we consider a ball B-(x) and let o be the hitting time of the boundary S,(x) when we start a 


Brownian motion from x, then the strong Markov property tells us that 


ixlf(Br)|Fo] = 'B,[f(B,)] = u(Bz), 


and plugging this in after using the law of iterated expectation yields 


u(x) = E[E[f(B, )|f(o)]|Bo = x] = E[u(Bo)|Bo = x] 


u(y)dy, 


_ 1 
IS-(x)| J's,00) 


since the distribution of the sphere hitting point is uniform. And the left and right sides of this equation yield the 


desired mean-value property. 


Next, we'll examine the continuous Green kernel: we can’t exactly define a “number of visits” in R?, but we can 


just use a density instead. 


Definition 170 


Let U C R®, and let pe(x, y; U) be the transition kernel of Brownian motion that is killed upon exiting U (more ex- 


plicitly, P,(Br € A; ty > t) = fi, pe(x, y: U)dy for all A). The Green kernel on U is Gy(x, y) = 1a pr(x, y; U)dt. 


In other words, the total time that we expect to spend in a set A is 


E, i 1{Bi € Ajar| = [Gu neay. 


This integral will be finite except maybe at y = x because we have a bounded domain, but we won't worry too much 
about those details. We showed in the discrete case that Gy = —(Ly)~1, meaning that the function Gy(x,-) is 
harmonic on U \ {x} and (LGy)(x) = —1. The first statement is still true in the continuous case, but the second no 
longer works because Gy Is singular at x. Thus, we'll need to restate the “inverse” condition: let P;,, be the operator 


such that 
Pautlx) = f px yiU)t Way. 
U 
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Note that this expression is also equal to Ex[f(B:z); 7, > t]. We then also define Gy to act on functions via integration: 


Gulf) = ff GunAy = f° Paufloat. 


Theorem 171 


The Green kernel inverts the Laplacian, meaning that for any smooth function f with compact support in U, we 


have —5AuGuf =f. 


Proof sketch. Consider the quantity 


= [autos =2 (f° Putas — f° Putas). 


The first integral on the right-hand side is Gy(f(x)), while the second can be rewritten as P: yGy(f(x)) by Chapman- 


1 
Kolmogorov, so the expression is equal to =r — 1)Gu(f(x)) | But now as t > 0, the first boxed expression 


Pru-l 
t 


converges to f, while the in the final expression converges to the generator sAu of the process. 


Everything we've been discussing so far has been using probabilistic quantities to say things about harmonic func- 
tions, but we can also work in reverse. In R®7, we have the standard Green kernel 
x-y?-4 d#2 
(d—2)[S* | ; 


M(x,y) = 


1 1 = 
py loopy d= 2, 


where |S¢~4| is the volume of the standard sphere. We can check by directly computing the derivative that (x, -) is 


always harmonic on R? \ x, so this helps us calculate probabilities for Brownian motion: 


Example 172 
Consider two balls of radius € and R centered at the origin, and consider a Brownian motion at some x in 


the annulus U between the balls, stopped when it hits either boundary. We wish to compute the probability 


Py (Te < Tp). 


Letting f : OU > R be the function which is 1 on the inner boundary OB, and 0 on the outer boundary OB,, we 


are trying to compute E,[f(B,)]. But we can construct the harmonic interpolation explicitly in this case: we have 


2—-d_|y|2—d 
rer I #2, 
u(x) ~ Px(Te = TR) ~ log R—log |x| 

log R—loge c= 2, 


where we've used the fact that F(x, -) is harmonic — we can verify that the boundary conditions are indeed satisfied. 
So this tells us an exact probability for the Brownian motion hitting distance € before distance R, and now if we take 
R — oo, the chance that T. < Tr goes to 1 for d = 1,2, but it goes to (5) 7 <1 for d > 3. So Brownian motion 
starting at x always hits any ball not containing x in dimensions 1 or 2, but this doesn't necessarily occur in larger 
dimensions. In other words, Brownian motion is recurrent for d = 1 or 2 but transient otherwise. 

Now, we can compare our Gy (x, y) (defined for a bounded U) to the classical F(x, y) (defined for the whole space). 


In IR?, F inverts the Laplacian, meaning that integrating against a smooth, compactly supported test function f yields 


[ resna,Ferddy = F609. 
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It turns out that we can actually take U = R® in the definition of Gy(x,y) = i p(x, y; U)dt — we have that 
i p(x, y; U)dt = 20 (x, y) for d > 3 but not for d = 1, 2 (because the integral diverges). Nevertheless, we can still 
get a relation between the Green kernel (with occupation densities) and the classical kernel: we recenter by taking any 


fixed vector w of norm 1, and then we have 


[ (p:(x, y, R°) — pe(x, x + w; R®)) dt = 2P(x, y). 
0 


Example 173 


We'll finish by discussing the Feynman—Kac formula, doing a discrete-time version to give some intuition for the 


continuous-time version on our homework. 


As before, let G = (V,E,c) define a reversible Markov chain Y,. Let f : V + R and w : V - [0,00) be two 


functions, and define 


u(n, xX) = Ex 


n-1 1 
(Yn) Il 1+ w(Yx) 


Then we can calculate u(n + 1, x) by conditioning on the first visit of the chain, so that 


un +1.) = Dp Nan) = Tas SS plx.y)(u(n, y) = un, x)) + Ho.) , 


But the sum over y in the right expression is just the discrete Laplacian L,u(n, x), meaning that after some rearranging, 
we arrive at 
u(n+ 1, x) — u(n, x) = Lyu(n, x) — w(x)u(n + 1,x). 


This is a discrete PDE with initial condition u(0, x) = f(x). In the continuous version of this result, we're similarly 


given two functions f : R? + R and w: R¢? = [0, 00) and defining 


u(t, x) = Ex, F(8.) ex (- [ w(B.)ds) 


We can then find (see homework) that u solves the partial differential equation 


out.) = sbxu(t, x) — w(x)u(t, x) 


again with initial condition u(0, x) = f(x). The special case w = 0 just gives us the heat equation (heat diffuses like 


Brownian motion), and the solution in that case is u(t, x) = E, [f(Bz)]. And since se should be zero at equilibrium 
(and thus u is a harmonic function), the only way for u to not be constant is if we have a nonconstant boundary 


condition. 


20 April 29, 2020 


We'll start with Chapter 8 of Le Gall today, discussing stochastic differential equations, existence and uniqueness of 


solutions, and the case where we have Lipschitz coefficients. 
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Definition 174 


Let o, b: [0,co) x R > R be real-valued functions of time and space that are locally bounded and measurable. A 


(weak) solution of the stochastic differential equation (SDE) 


EG: b) 5 {dX: = a(t, X:)dB; ar b(t, Xz) dt} 


(this is the usual informal notation) consists of the following: 
+ a filtered probability space (Q, F, F;, P) (we'll assume the filtration F; is complete), 
¢ an F+ Brownian motion B;, and 


* an F;-adapted process X; with continuous sample paths, such that 


t t 
Xt-=Xo +f a(s,X;)dB; +f b(s, X5)ds. 
0 0 


Let O(X); denote the right-hand side of this last equation (it implicitly depends on 2, F, Fz, and B). Because 
o and b are locally bounded, Jo o(5,Xs)dBs is a local martingale M(X), and i, b(s, Xs)ds a finite variation process 
A(X);, so we've had practice working with these types of objects already. If Xo = x € R, then we say that X is a 
solution for E,(¢, b). 

We'll focus on the one-dimensional case, though many results generalize to the multi-dimensional case. The way 
to think about this SDE is that it is the system governed by the ODE a = b(t, f(t))dt but with some additional noise 
odB;,. We have existence and uniqueness of solutions for the ODE under mild conditions, so we'd like to establish an 
analogous idea for SDEs. However, things become a bit more complicated — in particular, there are a few notions of 


what a “solution” means here: 


Definition 175 
A weak solution of E(a, b) (as above) is defined in Definition 174. A strong solution also satisfies the additional 
condition that X; is adapted to the Brownian filtration o(B, : 5s < t) C Fy. 


The idea is that X; appears on both sides of the weak solution, so we may want to solve for an X; where all of 


the randomness comes from the Brownian motion randomness alone. 


Definition 176 


Again take the definition of E(o, b) from above. We have weak uniqueness of solutions if all solutions of E,.(a, b) 


have the same law and pathwise uniqueness if given (Q, 7, F;,P,B), any two solutions X and Y with Xo = Yo 


almost surely are indistinguishable. 


Assuming that we have weak existence (so that we do have a weak solution), pathwise uniqueness is stronger 
than weak uniqueness. This isn't an obvious fact — we can read the book for an example where an SDE has weak 
uniqueness but not pathwise uniqueness (we can construct a probability space so that they are not indistinguishable). 
But showing that pathwise uniqueness implies weak uniqueness is the Yamada-Watanabe theorem — the idea is that 
indistinguishability requires us to look on a single probability space. If we are given Q, F, F;,P, B, and a starting point 
x, we have a unique solution X. But then if we have a different probability space and are given 2’, F’, F;, P’, B’, x, 
we will also have a unique solution X’, and the theorem tells us that X and X’ will have the same law. 


We'll first note down a technical result: 
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Fact 177 (Gronwell’s lemma) 


Let g be a nonnegative bounded function on [0, t], and suppose there exist a, b > 0 such that 


g(t) < at bf a(s)as 
0 


for all t € [0, T]. Then g(t) < ae” for all t € [0, 7]. 


This is a fact about deterministic functions — it’s basically a calculus fact, so we'll omit the proof. Note that if we 
had equality, this would be easy, since we would have g’(t) = bg(t) with initial condition g(0) = a, and this is uniquely 
solved by g(t) = ae”. 

We'll be considering a class of processes where we can prove all of the things that we want: specifically, we'll 
assume our coefficient functions o, b : [0,00o) x R — R are continuous (jointly as a function of space and time) and 


K-Lipschitz in the space coordinate, meaning that for all x, y € R, we have 


lo(t,x)—o(t.y)| <K|x—yl, |b(t, x) — b(t. y)| < K|x— y| 


Theorem 178 
If o, b are continuous and K-Lipschitz, then for all (Q, 7, F:,P, B) and all x € R, there exists a strong solution 


X for E,(o, b) on the the probability space, and we have pathwise uniqueness of solutions (meaning any other 


solution Y is indistinguishable from X, so all solutions are strong). 


Proof. We first show pathwise uniqueness. We are already given (Q, 7, F;,P, B), and suppose that X,Y are both 
solutions on this space with Xo = Yo almost surely. (For this part, the starting point does not need to be fixed.) Our 


goal is to show that X and Y are indistinguishable. Let 7 be the stopping time 


7 =inf{t >0:|X-— Xo] > M or |¥%: — Yo| > M}. 


Consider the function h(t) = S| (Xtnr — ene). Since X and Y have the same starting point and we stop before 
moving more than M away from the starting point, h is bounded by (2M). Now writing X and Y as a sum of the 


local martingale and FV parts and using the trivial inequality (u + v)* < 2(u? + v7), we in fact have 


(/ "Oe =H, ¥.))ds) ] | 


The first term is the expectation of the square of a stochastic integral, and recall that we've previously proven that 


2 
q (us H.d/Ms) | <E [Jc H2d(M).]. Using this and also applying Cauchy-Schwarz on the second term tells us that 


2 


+2 


h(t) < 26 ( | (als, Xe) - a(s, ¥.)) dB.) 


h(t) < 2E fo e. xX.) —a(s, ¥2))Pds| + 2tE [ws X= Dts. | 


Now applying the Lipschitz condition, we have 


h(t) < 2K7(14+ t)E [[% = ¥.)Pds| <2K7(1+ of h(s)ds < 2K7(1+T) 7 h(s)ds 
0 0 0 


for all t < T. Since h is a bounded nonnegative function, Gronwall's lemma tells us that 4 = 0 on [0, T] (because 


the constant term is 0), so Xt,7 = Year almost surely for all t € [0,7]. Now take M —> co and T — oo to show 
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that X; = Y; almost surely for all t. By definition this doesn’t mean we have indistinguishability, but in this case the 
assumption of continuity does imply that X and Y are indistinguishable, proving pathwise uniqueness. 

Now we'll show existence of a strong solution, and the calculation here will be fairly similar to what we've just done. 
Remember that we're working with a given (Q, F, F;,P, B), and a weak solution is just a solution of the fixed point 
equation X = ©(X). This motivates the idea of iterating : let X° be the constant process such that (X°); = x, and 
define X" = ©"(X°) for all n. Our goal is to show that X” converges to the fixed point solution satisfying X = 0(X), 
and we'll do this by bounding the difference between X" and X"*t!. Define 


Ga(t) = B [sup (x2** — x2)". 


s<t 


By the same calculation strategy as before and writing X"*+ and X” in terms of X” and X"~1, respectively, we find 


that F 
t 

+ 2E [a (/ b(s, X2) - b(s, X?-*)as) | 
s<t 0) 


Now the first term can be controlled because it is the supremum of a local martingale, so we can apply Doob’s L? 


2 


G,(t) < 2E [a ([ o.x2 —a(s, x!) dB.) 


s<t 


inequality and then use the same inequality as before. Then also applying Cauchy-Schwarz to the second term yields 


G,(t) < 8E if (o(s, X2) — o(s, Ko=t))2 as| + 2tE [foe xX?) — b(s, Xs 


Finally, applying the Lipschitz assumption we find that 


G,(t) |< 2K?(44+ t)E [foe xeyPas < 242(447) |” Gy-x(s)ds} 
0 0 


This means we have a bound for gp in terms of gp_1, and now we'll just do the base case 


Gi(t) =E sunt? - x 


where X} = 0(X°), =x + Io o(r,x)dB, + 3 b(r,x)dr. Because a, b are both locally bounded, G,(t) is bounded by 
some constant c(T) for all t € [0, T] (notice that we're bounding the expectation of the second moment of (X2— x), 


not the function itself). Then inductively integrating the boxed bound above yields 
t" 
Gntai(t) S$ (T)(2K7(4 + TT)" 


But this decays quickly because of the n! in the denominator, which means that when we sum the sup-norm differences, 


we get 


y » sup [Aer = X?| 
ee 


Thus, >>, SUPr<t At — X?| is almost surely finite for any 7, which means that X” converges uniformly to X on 


<S¢ VGa(T) < o. 


[0, T] (and in general any compact time interval). We know that X! is adapted to the Brownian filtration, and in 
general integrating against a Brownian motion still keeps things adapted, so X” is adapted to the Brownian filtration 
for all n and thus the limit process X is adapted as well. So we just need to check that this is actually a (weak) solution 
to X = 0(X), but we showed that X” = 6(X"~!) so taking n + oo makes both X" and X"~+ in the above equation 


converge to X. Thus it remains to check that 


ts t 
o(xn=x+ f a(s.X2)a8,+ f b(s,X2~")ds 
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converges to ®(X). It’s clear that the second term converges to Io b(s, X;)ds because b is continuous and X27 
converges uniformly to X” (so we can apply dominated convergence theorem). To argue that the first term converges 
correctly, we use the dominated convergence theorem on the difference Jo (a(s, Xs) —o(s, X2~1))dB, for stochas- 


tic integrals, where the dominating process is D; = K [do SUP-<s |X? — x1, This finishes the verification and 


constructs our strong solution X. 


For the next result, recall that the Wiener measure is the law of Brownian motion started from O — in particular, 


it is a measure on continuous functions C([0, co), R). 


Theorem 179 

Consider the space of functions C([0, co), IR). Let W be the Wiener measure, Bc the Borel sigma-algebra on 
C([0, co), R), and G the sigma-algebra o(B-, N), where N is the set of W-negligible sets. If o, b are continuous 
and K-Lipschitz, then for all x € R there exists a measurable function F, : (C([0, co), R), G) — (C([0, 00), R), Bc) 
such that the following hold: 


+ for all t, Fx(w)¢ coincides almost surely with a measurable function of (w(s): 5s < ft), 


* for all w, the map x — F,(w) is continuous as a map R + C([0, oo), R). 


+ for all choices of 2,4, F;,P, B and for all x, F,(B) is the unique solution of E,(o, b), and this is also true 


if we replace x with a random starting point U € Fo. 


The second point here tells us that if we start from two points x, y € R that are close to each other, and we evolve 
the SDE using the same Brownian motion, our paths will look similar when o, b are bounded and Lipschitz. What we'll 


do is apply Theorem 178 with the filtered probability space 
(Q, F, Fz, P, B) = (C([0, oo], R), G, Gt = o(w(s):5< t, N),Ww), 


where w is the canonical Brownian motion. Then the solution we get out of the theorem will be F,(w) — this is adapted 
to the Brownian motion, so we automatically satisfy the first points. So we just need to show that the mapping is 


continuous and that this works on any probability space, and we'll do this next time. 


21 May 4, 2020 


Recall that we've been looking at the stochastic differential equation dX; = o(t, Xz)dB; + b(t, X¢)dt, where o and 
b are K-Lipschitz in the x coordinate. Last time, we showed that given any (Q,F,F;,P, B), we can find a strong 
solution X# started at x and adapted to the Brownian motion, and this solution is unique by pathwise uniqueness. 
(The Yamada-Watanabe theorem, which states that pathwise uniqueness implies weak uniqueness, can be applied here, 
but we can also prove weak uniqueness directly in this K-Lipschitz case.) We'll now start with a proof of last time’s 
result, which stated that we have a measurable mapping Fy : (C([0, co), R),G) + (C([0, co), R), B) such that F,(w); 
is measurable of (w,)s<z, the map x — F,(w) is continuous in x, and F,(B) solves the stochastic differential equation 
E,(o, b). 


Proof of Theorem 179. We already showed there exists a strong solution X* for any (Q,F,F;,P,B,x), and we'll 
apply that here to the space(C([0, 00), R),G,G:, WW, w, x). We want to show continuity, and we'll make use of the 
Kolmogorov continuity lemma — recall that if we have any stochastic process F; which takes values in a complete 
separable metric space (S, d) with the bound | E[d(F;, F,)%] < C|s — t|*** |, then there is a modification F, of F; that 
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is a-Holder continuous for all a € (0, e). We applied this to Brownian motion earlier in the class, and remember that 


one part of the proof was to do a union bound (for the interval [0, 1]) 


1 a 
P (Xi — X(-1y2n| < (=) for all ' Oa) ON (te): 


In particular, this does not rely on having independent increments! So we'll apply the Kolmogorov continuity lemma to 
F,(w) (where we index by position x € R instead of by time), thinking of this process as taking values in the metric 


space (S, d) given by 


ie 
S=C([0,0o),R), d(f,g)= 5° = min @ sup |f(t) — a(0)l) 
2! t<n 
n>1 
Checking the assumptions of the continuity lemma requires the boxed estimate above, but we can read that on our 
own, and once we verify this we do indeed get continuity. The last thing we must check is that F,(B) solves our 
SDE given any (Q, F, F;,P, B); we know it’s true on the Wiener space, but we need to check that it’s true for any 


Brownian motion B. If we let w be our Brownian motion on the Wiener space, then F,(w) solves E,(o, b), meaning 


Fy (w)t - («+ [os w(s))dw(s) + [4s w(s))ds) =0. 


where w is our Brownian motion. Letting W(w); denote the left-hand side, we have [,, |W(w)|dW(w) = 0. But then 
any Brownian motion B has the same law as W, so we must also have { |W(B)|dP(B) = 0, which implies that F,(B) 


is a valid solution to our SDE. 


Remark 180. As a technical sidenote, we do need to make sure WV is measurable as a function of w — the main difficulty 
here is showing that the stochastic integral i a(s, w(s))dw(s) is a measurable function of w, but this follows from 


the approximation 


[ov w(s))dw(s) = tin oo ({ as e ({ =*)) (w (= a) ({ =*))) | 


1 


Note that this theorem we've just proved implies weak uniqueness without needing the Yamada-Watanabe theorem 
— the solution comes from applying the same map F, to our Brownian motion, no matter what probability space we're 


on, so the law of X~* is determined: for any event A we have 
P(X* € A) = P(F,(B) € A) =P(Be (F,) 1A), 


and then because the law of Brownian motion is given by the Wiener measure, this is just W((F,)~+A). So the law is 
just (Fy) 4eW. 


Remark 181. As stated /ast lecture, if we replace our starting point Xo = X with a random variable Xo = U € Fo, 


we can still get a solution to our SDE with Fy(B). But we can read the book for more details. 


We will now make a connection to Markov processes: suppose for this part of the lecture that o and b don’t 


depend on time and that they are still K-Lipschitz in x. 
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Theorem 182 
Suppose o(t, x) = o(x) and b(t, x) = b(x) are K-Lipschitz, and let X be a solution of the SDE E(a, b) on any 
probability space (Q, F, F;, P). Then X is a Markov process, and the semigroup can be described via 


Qf (x) = Ex[f(X¢)] = f F(F(w)e)d/(W), 


(Remember that we showed the existence and uniqueness of a strong solution X already.) 


Proof. We'll first show that E[f(X54+)|F;] = Q:f(X;), where f must be a bounded and measurable function. Define 
the shifted process X; = Xs+t, and write it out as 


st+t 


st+t t t 
X,=Xs +f o(X,)dB, +f b(X,)dr = Xo +f o(X,)dB, +f b(X,)dr. 
s s 0 0 


This means that X; solves the SDE Etc, b) on the probability space (Q, F, Fr = Fs+t,P) with the shifted Brownian 


motion B; = Bs4+ — Bs. Therefore, Theorem 179 tells us that Fy.(B) must solve the SDE, meaning we have 


a Fy,(B) = E[f(Xs+t)|Fs] =E [F(Fx,(B)e)|Fs] : 
And now B on the right-hand side is independent of F;, so we can write this out as 
X= / f (Fx.(w)t)dW(w) = Q:f (Xs), 


as desired. To finish, we need to show that Q: is a valid semigroup, which means that (1) Qo(x,-) is the Dirac 
measure 6x concentrated at x, (2) the Chapman-Kolmogorov equations Q.4+ = Q;Q; are satisfied, and (3) the map 


(t,x) + Q:(x, A) must be measurable. The last point follows from Q+ being continuous in t and x, and everything 


else is straightforward. 


Theorem 183 


In the same setting as the theorem above, Q; is a Feller semigroup. The space of functions A = CcPt(R) that 


are compactly supported and twice differentiable is contained in the domain D(L), and for any f € A we have 


Lf (x) = b(x)f!(x) + poe 


Proof sketch. We'll omit the proof of the Feller property, which is showing that (1) whenever f € Co(IR), we have 
Q:if € Co(R), and (2) Q:f — f converges in the sup-norm as t | 0. For the remaining claims, we apply It6’s formula 
to f € A and then plug in the SDE to find 


df (Xr) = f'(Xt)dXt + PUKD o(x,)Pat 
= f'(X¢) (0(Xt) dB: + D(X) dt) + OO) a (x,Pat. 


Subtracting off the drift term, we can define 


Me = F(X) f° (Poe pbx) + ACE) 46 
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so that M; = Jo o(Xs)dBs is a local martingale. Call the integrand in the boxed expression Gf(X,) — we want to show 
that G is the generator we're looking for. For simplicity, let’s assume o, b are bounded, so that M is a true martingale. 
We then have 


t 
0 = EM] = Bx [F(X] — Fx) — [Be GF(X.)] ds. 
0 
Dividing through by t and taking the limit t {| 0, we find that 


Get) ~ EM) ina [ 2, [GF(X.)]ds 


no t 


il 
= lim — 
to t 


Lf(x)= o 
t 

| Q;Gf(x)ds. 
0 


But we know that Q;(Gf) converges to Gf by the Feller property as s | 0, so this last expression is indeed Gf(x), 


showing that G is our generator. 


In short, we can phrase the results above as saying that time-independent SDEs with Lipschitz coefficients 


correspond to Feller processes. 


Example 184 (Ornstein-Uhlenbeck process) 


We'll apply our theory to the SDE dX; = dB; — AX;zdt. 


Define the process M; = e*tX; — by Itd's formula, we have 
d(e**X,) = e* (dB; — AX:dt) + Ae Xz at. 


Since the drift terms cancel out, we have dM; = e&tdB;, meaning that M; is a local martingale with increments 
given by integrating a deterministic function e** against a Brownian motion. So if we take Mo to be some integrable 


function, M; will be a true martingale, and 


t 
e'x,— X) =Mt—Mo = | &*S dB, 
0 


t 
and rearranging yields our solution | X; = e-**Xo +f aw gB. | lin particular, if Xo is deterministic or normal 
0 


and independent of B, then X; is actually a Gaussian process, where we can calculate 


Var(Xo) if ds __Var(Xo) , 1 (: ii ) 


T —- v 
e2rt > e2a(t=s) e2rt Or e2at 


Var(X+) = 


If we now choose Xo so that Var(Xo) = 3, then we have Var(X+) = 34 for all t. We can also check that Cov(Xs, Xr) = 
TKeGOESSD)" which only depends on the difference between s and t. This means the centered Gaussian process X is 
also stationary, and such a process is called the Ornstein-Uhlenbeck process. 


Example 185 (Geometric Brownian motion) 


Next, consider the SDE dX; = o0X:dB: + rX¢dt. 


This is the crudest possible model we could have for the stock market (we have some rate of appreciation, as well 


as some volatility). The idea here is to apply It6’s formula to log X; (as long as X stays positive), which yields 


i 1 
d(logX;) = x (oX,dB; + rX;dt) — aed Xedt. 
E 


92 


Again the drift terms cancel, so we end up with the equation d(log X;) = odB; + (r — s) dt. Since there is no 


2 
X-dependence on the right side, we can now integrate to get our solution | X; = Xo exp (ce, — (« — =) t) : 


22 May 11, 2020 


Today, we'll discuss a few results that are applications of what we've learned in this class, centered around the Dyson 


Brownian motion in random matrices. 


Definition 186 


Let (Bij)i<j be a family of tid Brownian motions. Then the symmetric matrix Brownian motion is the symmetric 


matrix H such that 
V5 i), 
H(t) = 4 Bi <i, 
Bij EGie 


In other words, all entries evolve according to independent Brownian motions, except that we want the matrix to 


be symmetric. We include the 2 factor here because another way that we can obtain this matrix is via 
X(t) + X(t)* 
v2 


where X is a standard Brownian motion in R’" (meaning all entries are independent) which we symmetrize and 


H(t) = 


rescale. So all off-diagonal entries evolve like standard Brownian motions, but the diagonal terms will have a larger 


variance. 
Definition 187 
The Hermitian matrix Brownian motion is similarly defined as 


X(t) + X(t)* 
=a 


H(t) = 


where X is a standard Brownian motion in C”*”. 


We say that mt for a symmetric BM H is a sample from the Gaussian orthogonal ensemble or GOE, and 
similarly ae for a Hermitian BM H is a sample from the Gaussian unitary ensemble or GUE. 

We can show that H sampled from either GOE or GUE will always have n distinct eigenvalues almost surely, which 
are real because we have a Hermitian matrix — we'll order them as A, <--- < Aj,. In fact, the ordered eigenvalue 
process A(t) = (A1(t),--+ , An(t)) is such that the eigenvalues never collide almost surely, so this eigenvalue process 
does not leave the Weyl chamber 


W,r={ZER 22 <-++< Zh. 


There are two main results we'll be covering today: 
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Theorem 188 


If H is a symmetric or Hermitian matrix Brownian motion, then the eigenvalue process solves the G-Dyson SDE 


ddj(t) = a rae ene + /2dB;(t), 


where B; are independent Brownian motions, where we have G = 1 in the symmetric case and G = 2 in the 


Hermitian case. 


Notice that all eigenvalues Ag < A; give a positive contribution to the drift term, and all eigenvalues Ag > A; give 
a negative contribution. So the eigenvalues will “repel” each other more strongly as the eigenvalues grow closer, and 
this is related to why the eigenvalues are distinct almost surely for the GOE and GUE. 

We won't focus too much on the proof of existence and uniqueness for now — it turns out that for any x € W,, 
in the Weyl chamber, there is a unique strong solution of the G-Dyson SDE started from x for all G > 1 which stays 


inside the chamber for all time. After that, we'll also cover the following result: 


Theorem 189 


For any x € W,, the G6 = 2 Dyson process started from x is equidistributed as an n-dimensional Brownian motion 


started from x, conditioned to stay inside the Weyl chamber. 


Note that n-dimensional Brownian motion will almost surely exit the Weyl chamber, because even two Brownian 
motions will intersect almost surely. So we'll need to be more precise about this statement to work with it. 


We'll first discuss the main ideas of the first result — most of the work is calculation: 


Proof sketch of Theorem 188. We'll just do the symmetric case — the method of proof is the same for the Hermitian 
case. Call the entries of our symmetric matrix Hj,. We'll calculate the first and second derivative of the /th eigenvalue 
with respect to each matrix entry Hjx (for all j < k) and then apply It6’s formula, but we must break into separate 


cases because the Brownian motions are different on the diagonals. We'll find that 


“. Od 
ant) = | sn FeV 2d Bu + Spe aBu(O)| 4 Dy. sm +3 5D om 


J<k 


The first bracketed term is the continuous local martingale term, and if we set it equal to V2dB;(t), the Lévy 
characterization verifies that B; is indeed a Brownian motion. Similarly, evaluating the partial derivatives on the 
second bracketed term will show that it matches up with the finite variation term in the theorem statement. So it just 


remains to explain how we actually calculate the needed derivatives: we write 


Ovid, 
se = api). 


where H(t) = H+ t(Ej. + Exj) and Ej, and E,; are the “matrix units” which are 0 everywhere except with a 1 in 
the (j, k) and (k,/) entries, respectively. This can then be computed using implicit differentiation, using the fact that 
Hu; = Uj. 


We'll need a technical lemma for the second result: 
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Lemma 190 (Andréif integration formula) 


For “nice” functions f and g;, we have 


ha) = Ax) | |grCa) =< gil) tpfig: «+ fp fign 
: : ’ dx = det . : 


fila) +++ falXn)| Lona) +++ Inxn) if ing). 2 ‘e thOn 


Proof. Writing out the definition of the determinant on the left-hand side, we have 


1 n n 
7 (= sgn(o) || a) S/san(r) ] ] gros) 
"YR" NG i=1 T j=l 
which can also be rewritten as ; 
1 
= Desantor) TT ( [ tntsdan de 
“o,T i=1 


(where we've used that the integral over the product is the product over the integral of the individual independent 


xjs). But now summing over all permutations o = oT will count each one n! times, so those factors cancel out and 


we get the formula of the determinant on the right side. 


Lemma 191 (Karlin-McGregor formula) 
Let B(t) be a Brownian motion in R”, and let T be the first exit time of B from the Weyl chamber. Then for 
any x € W, and measurable subset A C W,, 


P(X. V1) -°* pea, Yn) 


P,(B(t) € A;T > t) =f aet 7 : dy, 
A 


PAX Vile PAGnaya) 


where p; is the transition density of a usual one-dimensional Brownian motion. 


If the BMs were all independent and we didn’t care whether they collided or not, the transition kernel would just 
be pz(X1, ¥1) +++ Pt(Xn: Yn). So this result is saying that we need to use the determinant of ps instead if we require 
our BMs not to collide. (And here, we're restricting on the left side to the event that we haven't left by time t, not 


conditioning. ) 


Proof. Let T; be the collision time inf{t : B;(t) = Bj41(t)}, so that T = min T;. Again expanding out the determinant 


and writing the transition p; functions in terms of a Brownian motion yields (letting q(x, y) be the determinant) 


atx. y)dy = So son(o)Bx | T]116:(0) € ven} UTE H+ UT HAT <4} 


where bracketed term we've inserted is just 1. But when we look at the contribution from the 1{7 > t} term (meaning 


we haven't left the chamber) the only permutation that is relevant is that where the y,;s haven't gone out of order 


n 
from the x;s, so this contributes | E, II 1{B;(t) € dy;};T >t] |. And we can make a swapping argument to show 
i=1 
that the total contribution from the other part is zero (if T; happens before t, we can take the two Brownian motions 
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that cross and switch them, changing the sign of the difference). So the boxed term is the only contribution from the 


integral overall, and integrating it over A yields the left-hand side as desired. 


In order to condition on “not colliding,” we'll construct a martingale: 


Corollary 192 


For any x € R”, define the classic Vandermonde determinant 


il XL 


x2 


v(x) = T] v= x) = det 


i<j 
iL oe 


Then if B is a Brownian motion in R” and T is the exit time from the Weyl chamber, then Mz = v(Biatr) is a 


nonnegative martingale. 


Proof sketch. M; is indeed nonnegative because each term (x; — xj) in the product is nonnegative while we're in the 


Weyl chamber. Consider the expectation E,[v(Brar)| — if T < t, then M, = 0 (because v is zero when it hits the 


boundary of the chamber) and there is no contribution to the expectation. So taking (y1,--+ , Yn) to be the location 


of the Brownian motion at time t, we can just calculate 


Loy oe yw 
Ly rd Pr(X%1,¥1) +++ Pe(X1, Yn) 
ie dese 
Ey [V(Beat)i T > i= / det | ; . det : Pe : dy 
: Xn, Per oes Xn, 
i. $e. -s. Ge Pt (Xn. 1) Pt (Xn. Yn) 


where we've used the the Karlin- MacGregor formula. But symmetry in the variables here means we can integrate over 


all of IR” instead of W,, by adding a factor of aT and now the Andréif integration formula yields 


1 x E(x ae etZ | 
XQ ce tI ( xo ae 4/tZ)"4] 


E,[v(Beat)] = det 


Lox ve E(x + VtZ)"™*] 


(because by definition of the transition kernel, integrating y; against p+(x1, ¥1) will just yield x,). And now this is just 


equal to the simpler Vandermonde matrix 


1 x ie 
1 x5 

det ; = v(x) 
1. x% pie 


because we can expand out the rightmost columns by the binomial theorem and use row operations to subtract off 


lower powers. And this is basically the martingale identity we want once we use the Markov property. 
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Proposition 193 
Let B(t) be a Brownian motion started from x € W,. Then there exists a unique measure Q such that for all 


stopping times S < oo, 


dQ\r, _ Ms _ v(Bsar) 


dP\z. Mo v(x) 
Then the law of B under Q can be thought of as Brownian motion conditioned not to exit W,, — It is a Feller 
process with generator given by Lf(x) = (Vlog v(x), VF(x)) + SAF (x). 


Proof. Let R; be the first time that M; > 7. Then M® is a bounded martingale, so the optional stopping theorem 
tells us that the exit time satisfies P(T > Ri) = oe Define the measure 


dQ; : Mer; _ (1{T > R;} 


dP; Fr, ee v(x) v(x) 


(notice that this is zero if we exit the chamber before hitting /). Because M is a martingale, the Q; are consistent 


with each other, so there is a measure Q whose restrictions to Fr, are consistent with the Qjs, and it will have the 
dQr, 
dP les 


correct values of . But now we have 


PiAIT > Ry = RALMT> AD _ (16 > Ri} 


v(x)/i v(x) 


by definition of the Radon—Nikodym derivative. So as we take / + oo, we can think of this as conditioning on the 


) = Q(A) 


Brownian motion never exiting the chamber (since it takes arbitrarily long time to travel arbitrarily long distances). 


We can now find the generator by noting that 


oe 
Lf(x)= Hie [Eg [F(B(t))|B(0) = x] — F(x)]. 
Applying the change of measure, this can be written in terms of P as 


im | op (rey? 


tLo t v(x) 


B(O) = x) — i) 
Now v(x) is a constant, and Ité’s formula tells us that 
d(f(B(t))M(t)) = f(B(t))dM, + f'(B(t))MedB, + 5 Mef"(B(t) at + f'(B(t))d(B, M),. 


We can ignore the martingale term because we're taking expectations; M; has mean v(x), and d(B,M) = Vv. So 


plugging this back in yields 


Lf(x) = sar (x) + (vb, a) 


and we’re done because we) = Vlog v(x). 


Proof of Theorem 189. Recall that the 6 = 2 Dyson SDE is 


1 
dXj(t) = ay ND One | /2dB;(t). 


Apply a rescaling @ = a5 so that we have the equation 


dt 


do(t)=>~ ja dB;(t). 


4i 
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This SDE gives us a generator 


1 Of 1 
Lf = pee eee A 
dw OK * 3 ; 


and we can check this is indeed (V log v(x), Vf(x)) + Af, which is the generator for the non-colliding Brownian 


motion. Since the generators are the same, the two processes are equal in law, as desired. 


As a final note, the reason we care about this identification is that when we consider some symmetric random 
matrix X, we may want it to have spectral statistics like those of the GOE (this has to do with universality theorems 
for random matrices). To show this, we construct a flow on the matrices, where H(0) = X and H(t) evolves via an 
Ornstein-Uhlenbeck process (it’s like a Brownian motion, but we want it to stay stationary). Then H(oo) looks like 


the stationary distribution for Ornstein-Uhlenbeck, which is the GOE, and we can bound 


IE[F(X)] — E[F(H)]| < JE[FCH(0))] — E[FCA(t))]| + [E[FCH(t))] — E[FCH(0o))]| 


for a GOE matrix H. If we take t very small, the first term is small by perturbation theory, but the surprising fact is 
that the Dyson process mixes very quickly, so we can also control the second term (this is called the fast mixing of 


the Dyson process). And many references for further study can be found on the official course website! 
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